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ABSTRACT 


Military  aircraft  maintenance  methods  are  moving  from  practices  based  on  hard¬ 
time  inspection  and  replacement  intervals  to  one  of  Condition  Based  Maintenance 
(CBM).  CBM  allows  the  ability  to  forego  scheduled  maintenance  on  components  or 
systems  that  are  not  in  need  of  maintenance  or  replacement.  CBM  reduces  maintenance 
efforts  and  component  replacement  and  increases  readiness  and  safety. 

Goodrich  Corporation  has  developed  the  Integrated  Mechanical  Diagnostics 
Health  and  Usage  Management  System  (IMD-HUMS)  to  support  CBM  in  helicopters. 
Great  benefits  in  several  maintenance  practices,  readiness  and  safety  have  already  been 
realized  by  the  UH-60L  helicopter  military  unit  equipped  with  the  IMD-HUMS  system. 

The  total  potential  of  the  system,  for  the  components  observed  by  the  IMD- 
HUMS,  however,  has  not  yet  been  achieved.  The  IMD-HUMS  gathers  an  enormous 
amount  of  data  on  the  condition  of  these  components  and  systems.  The  meaning  and  full 
potential  of  all  this  data  has  not  yet  been  fully  realized  because  to  date,  this  data  has  never 
been  coupled  with  corresponding  maintenance  data. 

The  purpose  of  this  research  is  to  conduct  and  document  statistical  analysis  of 
IMD-HUMS  produced  data  with  corresponding  maintenance  data  of  observed  component 
failures.  Statistical  applications  of  logistic  regression  and  classification  trees  are  explored 
to  predict  failures.  The  approaches  used  in  the  exploration  of  the  IMD-HUMS 
acquisition  data  sets  are  based  on  sixty  electrical  generators  from  thirty  aircraft,  six  of 
which  displayed  degradation  or  failure  and  hence  required  maintenance  actions.  This 
approach  is  promising.  With  it  we  accurately  predict  two  previously  undocumented 
failures. 
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EXECUTIVE  SUMMARY 


Military  aircraft  maintenance  methods  are  moving  from  practices  based  on  hard¬ 
time  inspection  and  replacement  intervals  to  one  of  Condition  Based  Maintenance 
(CBM).  The  latter  practice  allows  the  ability  to  forego  scheduled  maintenance  on 
components  or  systems  which  have  reached  their  high  times  but  are  not  in  need  of 
maintenance  or  replacement.  Benefits  of  CBM  are  the  minimization  of  maintenance 
efforts  and  component  replacement  along  with  an  increase  in  readiness  and  safety. 

Goodrich  Corporation  has  developed  the  Integrated  Mechanical  Diagnostics 
Health  and  Usage  Management  System  (IMD-HUMS)  for  the  practices  of  CBM  in 
helicopters.  Great  benefits  have  already  been  realized  by  the  using  UH-60L  helicopter 
military  unit  with  the  IMD-HUMS  system  in  regards  to  several  maintenance  practices, 
readiness,  and  safety. 

The  total  potential  of  the  system,  in  regards  to  these  benefits  for  the  multiple 
components  observed  by  the  IMD-HUMS,  however,  is  not  yet  achieved.  The  IMD- 
HUMS  gathers  a  great  deal  of  pertinent,  important  data  on  the  condition  of  multiple 
components  and  systems,  but  the  meaning  and  full  potential  of  all  this  data  is  not  yet  fully 
realized. 

The  purpose  of  this  research  is  to  conduct  and  document  the  statistical  analysis  of 
IMD-HUMS  produced  data.  Statistical  applications  of  logistic  regression  and  random 
forest  of  classification  trees  are  explored.  The  approaches  used  in  the  exploration  of  the 
IMD-HUMS  acquisition  data  sets  are  based  on  six  electrical  generators  which  displayed 
degradation  or  failure — and  hence  required  maintenance  actions — compared  with  sixty 
others  which  did  not.  This  thesis  focuses  on  using  the  combination  of  resulting  vibratory 
patterns  and  maintenance  records  from  one  type  of  component,  the  electrical  generator  of 
the  UH-60L  helicopter,  to  forecast  the  need  for  maintenance.  Data  acquired  from  the 
IMD-HUMS  will  be  used  in  an  attempt  to  understand  and  predict  health  predictions  of 
the  UH-60L  electrical  generator,  and  in  hopes  of  gaining  insights  in  developing 
component  health  predictions  from  IMD-HUMS  data  for  other  components. 


This  thesis  discusses  how  the  resulting  predicted  health  classifications  compare  to 
how  each  of  the  generators  are  currently  classified.  In  this  process,  some  surprising  cases 
of  generator  health  classification  are  uncovered.  One  generator,  which  was  wrongly 
presumed  to  be  bad  and,  similarly,  another  generator,  which  was  wrongly  assumed  to  be 
good,  were  predicted  correctly  by  this  study's  classification  scheme.  The  thesis 
demonstrates  that  two  different  models — logistic  regression  and  random  forest  of 
classification  trees — can  be  fit  using  IMD-HUMS  data  collected  with  known  cases  of 
failed  generators  and  properly  operating  generators.  These  models  can  predict  the  overall 
state  of  a  UH-60L  electrical  generator. 
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I.  INTRODUCTION 


There  are  over  12,000  aircraft  in  the  U.S.  military's  inventory,  with  nearly  2,400 
in  the  Navy  and  Marine  Corps  (International  Institute  for  Strategic  Studies,  2005).  In 
Fiscal  Year  2005,  Congress  obligated  over  5.29  billion  dollars  toward  the  operation  and 
maintenance  of  these  Naval  aircraft,  with  1.08  billion  dollars  of  this  money  obligated 
strictly  to  intermediate  and  depot-level  maintenance  (Office  of  the  Undersecretary  of 
Defense,  2005). 

To  put  these  operation  and  maintenance  costs  into  perspective,  consider  that 
flying  a  single  CH-53E  helicopter  for  one  flight  hour  costs  $14,000  and  requires  44 
maintenance  man-hours  (http://www.aviationtoday.com,  Nov  2005).  A  solution  to  these 
high  costs  may  be  found  through  the  services'  concerted  efforts  to  move  away  from  a 
scheduled  maintenance  approach  toward  a  combination  of  scheduled  and  condition  based 
maintenance  (CBM).  In  fact,  this  has  been  mandated  as  the  Department  of  Defense 
(DoD)  required  strategy  to  improve  aircraft  supportability  (DoD  Instruction  5000.2,  May 
2003).  For  helicopters,  this  means  monitoring  the  conditions  of  the  mechanical 
components,  which  account  for  70%  of  maintenance  costs  (Ruben  &  Rossi,  2003). 

Monitoring  of  these  components  is  best  accomplished  through  the  collection  of 
these  components'  vibratory  patterns.  Goodrich  Corporation  has  developed  the  Integrated 
Mechanical  Diagnostics  Health  and  Usage  Management  System  (IMD-HUMS)  which 
collects  and  analyzes  a  helicopter  component's  vibrations  for  use  in  CBM.  The  system 
has  been  installed  and  operational  for  over  two  years  in  30  U.S.  Army  UH-60L 
helicopters.  This  provides  the  opportunity,  for  the  first  time,  to  investigate  data  produced 
by  IMD-HUMS  installed  in  a  large  fleet  of  operational  helicopters,  rather  than  data  from 
test  stand  mounted  fault-induced  components  or  test-bed  aircraft. 

The  IMD-HUMS  is  worthy  of  study  because  major  economic,  operational  and 
safety  benefits  can  be  realized  by  incorporating  such  CBM  systems  into  aircraft 
maintenance  practices.  This  thesis  focuses  on  using  the  combination  of  resulting 
vibratory  patterns  and  maintenance  records  from  one  type  of  component,  the  electrical 
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generator,  to  forecast  the  need  for  maintenance.  The  data  is  explored  and  analyzed  using 
statistical  approaches  in  hopes  of  gaining  insights  in  developing  component  health 
predictions  from  IMD-HUMS  data. 


A.  LITERATURE  REVIEW 

Numerous  papers  describe  the  IMD-HUMS;  however,  very  little  work  concerns 
specific  analysis  of  operational  helicopter  vibratory  patterns,  and  even  less  focuses  on 
relating  changes  in  the  vibratory  patterns  to  actual  operational  maintenance  events. 

The  "Systems  Users  Manual  for  IMD-HUMS"  (U.S.  Army  Publication,  2005)  and 
the  "P3I  VPU/DTD  Software  Requirements  Specifications"  (Goodrich  Publication,  2001) 
provide  the  basic  terminology,  concept  of  operations,  and  an  explanation  of  the  physical 
measurements  regarding  the  IMD-HUMS.  Understanding  the  physics  behind  the 
vibratory  patterns  is  essential  for  predicting  component  health. 

Various  papers  and  briefs  written  primarily  by  employees  of  Goodrich 
Corporation  and  the  IMD-HUMS  Program  Managers  Office  provide  an  overview  of  the 
uses  and  issues  concerning  the  IMD-HUMS.  Hess,  Duke  and  Kogut  (2005)  provide  a 
good  overview  of  the  development  history,  terms,  functionality,  and  potential  of  the 
IMD-HUMS.  The  master’s  thesis  by  Revor  (2004)  uses  discrete  event  simulation  backed 
by  Naval  Aviation  Logistics  Analysis  (NALDA)  databases  to  investigate  the  cost  benefits 
of  incorporating  the  IMD-HUMS  into  helicopter  rotor  track  and  balance  maintenance 
actions.  Revor's  simulation  supports  the  idea  that  using  the  IMD-HUMS  will  decrease 
costs  and  maintenance  efforts. 

Several  Goodrich  papers  also  discuss  the  mathematical  concepts  and  algorithmic 
inner  workings  of  the  IMD-HUMS  in  detail.  These  papers  provide  insight  into  the 
complexity  and  potential  of  the  system;  for  example,  see  Bechhoefer  and  Power  (2002) 
and  Hochmann  (2004).  The  latter  paper  addresses  the  issue  of  variability  among 
vibratory  pattern  observations  which  originate  from  seemingly  identical  operating 
conditions. 
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The  master’s  thesis  by  Elyurek  (2003)  presents  empirical  studies  of  vibratory 
patterns.  Elyurek  (2003)  uses  Box-Jenkins  time  series  modeling  with  regression  to 
determine  vibration  thresholds  for  gear  fault  identification.  Elyurek's  study  is  based  on 
operational  data  produced  by  a  test  IMD-HUMS  installed  CH-53  helicopter.  He 
concludes  that  his  model  could  not  match  the  required  negligible  alann  rate  due  to  the 
small  sample  size  available. 

Only  recently  has  it  been  possible  to  look  at  vibratory  patterns  matched  with 
corresponding  operational  maintenance  events.  Wright  (2005)  investigates  several  cases 
of  maintenance  discrepancy  detections  made  by  the  30  UH-60L  helicopters  with  IMD- 
HUMS  installed.  In  three  particular  cases,  the  IMD-HUMS  data  indicated  that  the 
generator  was  about  to  fail  before  it  actually  did.  The  paper  explains  the  subsequent 
investigation  and  facts  concerning  these  generators.  The  apparent  relationship  between 
changes  in  vibratory  patterns  and  the  failed  generators  described  by  Wright  provides  the 
motivation  for  choosing  UH-60L  generators  for  this  study.  In  addition,  the  paper 
discusses  processes  developed  to  incorporate  the  IMD-HUMS  data  into  beneficial 
maintenance  practices. 


B.  RESEARCH  FOCUS 

With  the  exception  of  Wright's  paper  there  are  no  published  works  that 
empirically  relate  vibratory  patterns  to  documented  operational  maintenance  events.  The 
full  potential  of  CBM  using  IMD-HUMS  in  particular  has  not  yet  been  fully  realized. 
The  objective  of  CBM  is  to  know,  from  the  data  collected  by  sensor  readings,  when  a 
component  or  system  needs  replacement  or  maintenance.  A  simple  analogy  to  CBM  is 
when  a  medical  doctor  observes  a  person’s  temperature,  blood  pressure  and  heart  rate. 
The  readings  could  mean  many  different  things  under  different  circumstances,  but  an 
experienced  doctor  would  be  able  tell  if  that  person  is  of  good  health  or  not,  and 
specifically  what  medical  actions  to  take.  Now,  imagine  the  first  time  in  history  a  doctor 
listened  to  a  heart  beat.  He  knew  this  information  was  important  and  could  explain  a 
great  deal  concerning  a  patient's  health,  but  everything  the  patient's  heart  beat  can  tell  the 
doctor  was  not  yet  known.  This  is  where  we  are  now  with  much  of  the  data  resulting  from 
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the  IMD-HUMS.  The  IMD-HUMS  data  tells  the  user  something  about  each  monitored 
component’s  health  and  future  health,  but  exactly  what  it  tells  is  deserving  of  study. 

This  issue  is  addressed  in  this  thesis.  Data  acquired  from  a  CBM-based  system 
(the  IMD-HUMS)  will  be  used  in  an  attempt  to  understand  and  predict  the  state, 
condition  and  performance  of  a  component  (the  UH-60L  electrical  generator). 

The  UH-60L  electrical  generators  were  chosen  for  study  for  two  reasons.  First, 
during  the  two  years  the  helicopters  were  installed  with  IMD-HUMS  there  were  six 
generators  which  needed  to  be  removed  from  operations  for  some  reason  of  fault,  and 
there  were  60  generators  deemed  to  be  working  properly.  This  provides  a  data  set  in 
which  generators  could  be  classified  as  "bad"  (removed  for  some  fault)  and  "good" 
(working  properly).  Second,  the  electrical  generators  are  relatively  simple  components  to 
study  when  compared  to  aircraft  engines  or  transmissions.  The  generators  have  fewer 
moving  parts  which  produce  vibrations  and  are  much  less  likely  to  be  affected  by  factors 
such  as  flight  regime  or  torque  settings. 

C.  APPROACH 

This  thesis's  approach  for  assessing  the  generators'  health  is  somewhat  different 
than  the  current  method  of  health  assessment  used  with  the  IMD-HUMS.  Currently  a 
component's  overall  health  assessment  is  assessed  by  using  a  Health  Indicator  (HI)  for 
that  component.  Each  component  HI  is  computed  from  a  subset  of  IMD-HUMS 
vibratory  readings  known  as  Condition  Indicators  (Cl).  A  component's  HI  is  a  statistic 
which  summarizes  when  the  Cl  corresponding  to  that  component  have  unusual  values 
compared  to  the  historical  distributions  of  these  CL  The  Cl  readings  are  just  from 
specific  parts  within  the  component  itself.  For  instance,  the  generator's  health  is 
monitored  by  the  HI  computed  from  Cl  originating  from  the  generator's  shaft.  Rather 
than  attempt  to  supplant  the  current  method  by  using  different  Cl  or  by  changing  how  the 
HI  are  computed  from  the  Cl,  the  approach  used  in  this  thesis  augments  the  current 
method. 

First,  to  assess  generator  health  a  broader  set  of  Cl  are  used.  Not  only  are  Cl 
originating  from  the  generator  shaft  vibratory  patterns  used,  but  Cl  from  the  vibratory 
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patterns  of  the  nearby  supporting  gear  and  bearing  are  also  used.  Second,  for  the  UH- 
60L  generators,  there  are  two  years  of  empirical  IMD-HUMS  data  along  with 
corresponding  maintenance  records  for  30  aircraft,  each  with  two  generators.  This  data 
set  should  be  large  enough  to  contain  examples  of  the  most  common  failure  modes  from 
generators  along  with  their  corresponding  vibratory  patterns.  The  data  set  also  contains 
examples  of  healthy  generators  along  with  their  vibratory  patterns.  A  classification 
scheme  is  developed  based  on  these  examples  of  good  and  bad  generators  and  their 
vibratory  patterns  as  measured  by  the  approximately  170  Cl  related  to  the  generator  shaft, 
bearing  and  gear. 

The  classification  scheme  uses  a  logistic  regression  fit  to  the  data  which  estimates 
the  probability  of  the  generator  being  bad  as  a  function  of  the  CI.  This  logistic  regression 
fit  does  not  explicitly  take  into  account  the  time  series  nature  of  CI  readings.  Therefore, 
as  a  basis  for  classification,  a  loess  smoother  of  the  probabilities  predicted  over  time  by 
the  logistic  regression  is  used.  To  test  the  predictive  ability  of  the  classification  scheme, 
the  generator  data  is  divided  into  two  sets:  a  training  set  and  an  experimental  set.  Only 
the  training  set  is  used  in  the  logistic  regression  fit.  The  classification  scheme  is  then 
tested  on  the  experimental  data  which  contains  a  bad  generator,  several  good  generators, 
and  generators  of  questionable  health. 

D.  OUTLINE  OF  STUDY 

Chapter  II  gives  the  background  needed  to  understand  this  study.  In  particular  it 
provides  an  overview  of  CBM,  IMD-HUMS,  the  UH-60L  helicopter  and  its  electrical 
generators.  This  chapter  also  provides  fundamental  knowledge  concerning  the  CI  and  HI 
used  in  this  study.  This  is  important  because  the  data  set  of  flight  regimes  and  vibratory 
patterns  for  30  aircraft  over  two  years  of  operation  is  very  large.  It  contains  both  a  large 
number  of  variables  and  a  large  number  of  records. 

Chapter  III  describes  the  data  set  and  how  it  is  partitioned  into  the  training  and 
experimental  sets.  The  vibratory  patterns  and  flight  regime  data  are  also  studied  for  both 
good  and  bad  generators  in  the  training  set.  This  analysis  chapter  begins  with  graphical 
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exploration  to  investigate  differences  in  the  training  data  among  the  good  and  bad 
generators,  as  well  as  differences  among  just  the  bad  generators. 

In  the  second  part  of  the  analysis,  a  parametric  model,  logistic  regression 
(Montgomery,  2001),  is  fit  to  the  training  data.  As  a  check  of  the  estimated  probabilities 
of  the  generator  being  bad,  a  nonparametric  model,  a  random  forest  of  classification  trees 
(Berk,  2005),  is  also  fit  to  the  training  data.  These  two  models  give  respectively  an 
estimated  probability  of  a  generator  being  bad  and  a  classification  of  a  generator  being 
bad  or  good  for  each  acquisition. 

In  Chapter  IV  the  logistic  regression  and  random  forest  classifiers  are  applied  to 
each  of  the  generators  used  in  the  study.  The  probabilities  of  being  predicted  bad  from 
the  logistic  regression  are  plotted  over  time  and  then  smoothed.  These  smoothed  versions 
are  used  to  classify  each  generator  in  the  training  and  experimental  data  set  as  good  or 
bad.  The  end  of  the  chapter  carefully  discusses  how  these  predicted  classifications 
compare  to  how  each  of  the  generators  are  actually  classified.  In  this  process,  some 
surprising  cases  of  generator  health  classification  are  uncovered.  One  generator  which 
was  wrongly  presumed  to  be  bad  and  conversely  another  generator  which  was  wrongly 
assumed  to  be  good  were  classified  correctly  by  this  study's  approach. 

Conclusions  and  recommendations  for  further  study  are  given  in  the  final  chapter. 
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II.  BACKGROUND:  SYSTEMS  AND  CONCEPTS 


This  section  introduces  and  explains  the  concepts  of  scheduled  maintenance, 
CBM  and  vibration  analysis.  A  description  and  the  principles  of  operation  of  the  IMD- 
HUMS  are  given  because  this  system  acquires,  manipulates  and  stores  the  data.  A  brief 
description  of  the  UH-60L  helicopter,  its  electrical  generators  and  supporting  components 
is  included  since  they  are  the  source  of  the  studied  data. 


A.  AIRCRAFT  MAINTENANCE  CONCEPTS 

There  are  two  very  different  concepts  in  the  way  military  aircraft  maintenance  is 
performed.  The  first,  scheduled  maintenance,  uses  traditional  methods  based  upon  time 
of  usage.  The  other  is  CBM,  which  is  heavily  dependant  upon  vibration  monitoring  and 
diagnostics. 


1.  Scheduled  Maintenance 

Currently,  the  maintenance  upon  most  military  aircraft  is  performed  under  the 
concept  of  scheduled  maintenance  or  the  idea  of  Time  Before  Overhaul  (TBO).  One  of 
two  cases  occurs  which  result  in  a  required  maintenance  action.  A  component  or  system 
noticeably  fails,  or  is  operating  in  a  noticeably  degraded  mode  in  which  case  it  is  replaced 
or  fixed;  or  the  component  or  system  reaches  a  pre-determined  amount  of  usage  at  which 
time  it  is  replaced  or  inspected.  The  inspections  or  replacements  are  based  upon  set  hard- 
times  of  usage.  For  examples,  there  may  be  a  requirement  for  a  phase  inspection  after 
100  hours  of  pilot  logged  flight  time,  transmission  and  engine  replacement  after  a  specific 
number  of  flight  hours,  jet  engine  power  tests  after  a  designated  number  of  usage  hours, 
or  replacement  of  the  tail-hook  on  a  carrier-based  aircraft  after  a  specific  number  of  traps. 
The  number  of  flight  hours  or  usage  until  required  maintenance  is  determined  by  design 
engineers  based  upon  the  probability  of  when  the  component  is  most  likely  to  fail  and  the 
severity  of  the  consequences  of  its  failure.  These  usage  intervals  are  historically  and 
purposefully  set  to  be  in  a  conservative  to  extremely  conservative  range.  The  greater  the 
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severity  of  the  consequence  of  failure,  the  more  conservative  the  required  inspection  or 
replacement  time  becomes.  Design  engineers  will  set  an  inspection  interval  or 
replacement  time  that  ensures  the  component  is  inspected  several  times  or  replaced 
before  the  expected  failure  (Rotor  &  Wing  Magazine,  April  2005).  For  instance,  if  the 
bearings  on  a  helicopter's  rotor  head  system  are  expected  to  fail  or  to  wear  to  an 
unacceptable  level  after  500  flight  hours,  the  design  engineers  may  dictate  a  phase 
inspection  where  the  bearings  are  disassembled  and  inspected  every  100  hours,  and  then 
replaced  regardless  of  condition  by  300  hours.  An  aspect  of  the  scheduled  TBO 
maintenance  concept  is  that,  as  the  name  implies,  maintenance  actions  are  tied  to  time. 
Maintenance  planners  must  adhere  to  dictated  usage  limits.  Sometimes  there  is  a  window 
of  time,  an  allowable  plus  or  minus  percentage  of  usage,  permitting  some  flexibility  in 
planning.  The  counting  and  tracking  of  usage  is  critical  in  scheduled  TBO  maintenance. 

While  the  scheduled  or  TBO  concept  of  maintenance  practices  has  served  the 
military  well  over  many  years,  the  concept  has  several  inherent  drawbacks.  The  first  is 
that  a  preponderance  of  inspections  or  replacements  are  conducted  on  perfectly 
functioning  components  only  because  the  usage  time  dictates  so.  If  maintenance  actions 
were  performed  only  when  a  component  was  known  to  be  in  a  state  of  unacceptable 
degradation  or  definitively  failing,  a  great  deal  of  time,  effort  and  costs  could  be  saved. 
Many  inspections  could  be  eliminated  and  perfectly  functioning  parts  could  remain  in 
operation  until  they  were  known  to  be  in  one  of  the  above-mentioned  states.  Another 
drawback  of  scheduled  TBO  is  that  it  is  rarely  based  on  the  history  of  the  components. 
Using  the  prior  example  of  the  bearings  in  a  helicopter's  rotor  system,  if  sufficient  data 
had  been  collected  which  indicate  that  only  1  in  1000  bearings  had  degraded  by  the  500 
flight  hour  TBO,  perhaps  an  inspection  interval  of  every  400  hours  could  produce  the 
same  or  better  safety  and  readiness  levels  with  a  savings  in  time,  maintenance  effort  and 
costs.  "Historical  data"  is  rarely  incorporated  into  the  scheduled  TBO  maintenance 
concept. 
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2.  Condition  Based  Maintenance 

A  different  approach  to  the  performance  of  aircraft  maintenance  is  CBM.  The 
underlying  concept  of  CBM  is  to  perform  aircraft  maintenance  only  when  monitoring 
sensors  indicate  that  maintenance  is  needed  or  will  be  needed  on  a  component  or  system. 
Maintenance  planners  and  maintenance  actions  are  not  tied  to  the  counting  and  tracking 
of  usage;  rather  the  focus  is  on  a  component's  state  or  condition.  Monitoring  sensors 
collect  and  record  status  and  perfonnance  data  of  specific  components  while  in  use. 
From  this  data  the  actual  condition,  or  state,  of  the  components  is  then  inferred  for  the 
user.  This  provides  the  ability  to  forego  scheduled  maintenance  on  components  or 
systems  which  have  reached  their  high  times  but  are  still  functioning  properly.  Likewise, 
the  user  can  specifically  identify  a  failed  or  degraded  component  before  its  scheduled 
inspection  and  take  immediate  corrective  maintenance  action.  Additionally,  if  a 
maintenance  planner  is  alerted  to  the  fact  that  a  component  is  degrading,  or  that  its 
performance  is  lessening  although  still  operating  at  an  acceptable  level,  the  planner  is 
afforded  greater  flexibility  in  the  scheduling  of  maintenance.  The  maintainer  not  only 
understands  that  the  component  is  wearing,  but  also,  perhaps,  at  what  rate  and  from  that 
fact  can  choose  the  time  of  a  required  maintenance  action. 

In  summary,  the  goal  of  the  move  to  CBM  is  to  rapidly  and  accurately  identify 
faults  in  order  to  eliminate  time-consuming  inspections  and  unnecessary  component 
replacements.  Potential  benefits  of  CBM  are  the  minimization  of  maintenance  efforts 
and  component  replacement  along  with  an  increase  in  readiness  and  safety.  Thus,  the 
CBM  concept  has  the  potential  to  eliminate  the  shortfalls  of  scheduled  TBO  maintenance. 

Cases  of  success  have  already  been  demonstrated  by  the  IMD-HUMS  operating  in 
the  30  UH-60L  helicopters.  For  example  the  system  was  able  to  detennine  the  cause  of  a 
persistent  buzz  felt  by  aircrew  during  flight.  For  400  flight  hours  prior  to  the  installation 
of  the  IMD-HUMS  the  buzz  had  been  unidentifiable.  After  IMD-HUMS  installation  the 
source  of  the  vibrations  was  isolated  to  the  electrical  generator.  Upon  removal  of  the 
generator  the  spline  adapter  was  found  to  be  severely  worn.  Replacement  of  the  adapter 
eliminated  the  buzz.  Other  benefits  have  been  realized  in  regard  to  several  maintenance 
practices,  readiness,  and  safety.  During  the  thesis  experience  tour  in  which  the  system 
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was  demonstrated  to  the  authors,  maintainers  expressed  that  when  using  the  system  the 
process  of  both  "main  rotor  track  and  balance"  and  "tail  rotor  vibes"  had  become  much 
simpler,  quicker  and  reliable  with  respect  to  maintenance  requirements.  For  more 
successful  applications  of  the  CBM  concept  refer  to  Collacott  (1979)  which  lists  case 
studies  and  resulting  benefits  of  CBM  in  the  shipping,  mining,  production,  nuclear  power 
and  aviation  industries.  For  a  better  understanding  of  the  DoD  strategy  and  issues  of 
CBM  see  Butcher  (2000).  This  report  addresses  the  benefits  and  rewards  the  military 
services  are  reaping  through  CBM  as  well  as  issues  concerning  further  implementation  of 
CBM.  The  IMD-HUMS  is  one  of  the  key  CBM  programs  case  studied  in  the  report. 

B.  IMD-HUMS 

1.  Purpose 

"...in  the  22  years  I've  been  in  the  Anny,  this  is  the  best  program  as  far  as  going 
from  reactive  to  pro-active  maintenance..."  Sergeant  First  Class  Reeve,  Delta  Co,  4th  Bn, 
101st  AVN  Div,  7  June  2005. 

Coming  from  one  of  the  maintainers  of  the  30  US  Army  UH-60L  helicopters  with 
IMD-HUMS,  this  quote  by  SFC  Reeve  lends  credence  to  the  potential  and  worth  of  the 
IMD-HUMS.  The  US  Army  plans  to  install  the  IMD-HUMS  on  all  of  its  UH-60M 
helicopters.  In  addition,  the  system  has  been  purchased  by  the  US  Navy  for  installation 
into  CH-53E  helicopters.  The  Navy  is  also  considering  installation  of  this  system  on  the 
H-60,  UH-1,  AH-1,  and  V-22  aircraft  (NAVAIR  e-mail,  8  July  2005).  Goodrich 
Corporation  began  development  of  the  IMD-HUMS  to  perfonn  CBM  on  helicopters  in 
1997  under  the  auspices  of  the  DoD  Commercial  Operations  &  Support  Savings  Initiative 
(COSSI).  The  underlying  purpose  of  the  IMD-HUMS  is  to  improve  flight  readiness  and 
safety,  with  the  added  bonus  of  savings  in  maintenance  effort,  time  and  costs.  (Hess, 
2001) 

The  IMD-HUMS  provides  automated  equipment  usage  tracking  for  life-limited 
components,  from  entry  into  service  until  retirement.  The  usage  tracking  is  used  not  only 
in  the  continuation  of  scheduled  TBO  maintenance  practices,  but  also  for  determining 
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accurate  component  lifetimes  and  for  developing  component  fault  prediction  models. 
Instrumentation  aboard  the  aircraft  collects  usage  data  during  aircraft  operations,  which  is 
then  applied  to  life-limited  components  currently  installed  on  the  aircraft  (IMD-HUMS 
User  Manual,  2005).  With  IMD-HUMS  component  usage  times  are  automatically 
counted  and  tracked  for  TBO;  previously  this  process  was  conducted  manually.  Most 
important,  usage  times  may  be  computed  from  any  number  of  variables,  including  time 
spent  in  various  flight  regimes.  It  stands  to  reason  that  components  of  aircraft  which  fly 
mostly  straight  and  level  and  take  off  and  land  at  improved  airfields  wear  more  slowly 
than  components  on  aircraft  that  are  used  for  high-stress  maneuvers  in  harsh 
environments,  like,  for  example,  the  deserts  of  Iraq.  The  IMD-HUMS  tracks  these 
regimes,  and,  through  study,  users  may  be  able  to  determine  what  components  wear, 
under  what  regimes  and  at  what  rate.  Through  this  capability,  flight  readiness  and  safety 
are  enhanced  through  the  early  identification  of  degraded  components  (IMD-HUMS  User 
Manual,  2005). 

2.  Concept  of  Operations 

The  IMD-HUMS  provides  an  automated  capability  to  monitor,  diagnose  and  track 
usage  for  many  components  of  a  helicopter.  Sensors  of  the  IMD-HUMS  which  are 
installed  on  the  helicopters  collect  data  during  flight  operations.  The  initial  acquired 
measurements  are  physical  in  nature:  motion,  rates  of  motion,  and  forces.  An  acquisition 
is  the  record  of  a  specific  set  of  these  measurements  over  a  fixed  period  of  time.  For  each 
acquisition,  the  IMD-HUMS  manipulates  these  readings  through  proprietary  algorithms 
to  compute  Cl,  and  from  these,  HI  for  each  component.  The  Cl  are  values  which  depict  a 
certain  aspect  of  a  component's  state  and  are  calculated  from  the  raw  data  of  physical 
measurements.  The  Cl  are  aggregated  to  produce  a  components  health  indication  (HI). 
This  collection  of  Cl  and  HI  for  each  acquisition  is  then  used  for  maintenance 
diagnostics. 

The  two  main  sub-systems  of  the  IMD-HUMS  are  the  On-Board  System  (OBS) 
and  the  Ground  Station  System  (GSS).  The  OBS  is  physically  located  on  the  helicopter 
and  is  comprised  of  a  cockpit  display  unit  (CDU),  a  data  transfer  unit  (DTU)  and  data 
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transfer  memory  unit  (DTMU),  a  remote  data  concentrator  (RDC),  a  main  processor  unit 
(MPU),  two  junction  boxes  (JB1/JB2),  30  accelerometers,  a  main  and  tail  rotor  magnetic 
RPM  sensors,  a  main  rotor  blade  tracker,  and  engine  output  shaft  optical  tachometers. 
The  GSS  is  external  to  the  helicopter,  runs  on  a  PC  and  is  comprised  of  the  computer 
hardware  and  software  that  reads  and  processes  the  data  collected  from  the  OBS.  (Figure 
1)  (IMD-HUMS  User  Manual,  2005) 
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Figure  1 .  The  Components  of  the  IMD-HUMS  (from  System  Users  Manual  for 
IMD-HUMS) 
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A  helicopter  component  in  operation  results  in  an  associated  vibration.  Each  of 
the  many  components  of  an  operating  helicopter  produces  vibrations.  It  is  these 
vibrations  of  which  the  IMD-HUMS  takes  readings.  IMD-HUMS  data  collection  begins 
at  the  various  aircraft  sensors.  For  instance,  the  sensors  used  for  data  collection  from  the 
electrical  generators  are  accelerometers;  they  are  located  on  the  transmission  accessory 
gear  box  modules,  one  for  each  of  the  two  generators  (Figure  2).  These  accelerometers 
are  used  to  measure  the  specific  vibrations  which  come  from  all  the  internal  components 
such  as  gears,  shafts  and  bearings  throughout  the  transmission  accessory  gear  box 
module,  not  just  the  electrical  generators.  Data  collected  by  the  accelerometers  is  then 
sent  directly  to  the  MPU  for  processing. 
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Figure  2.  Focation  of  IMD-HUMS  Accelerometers  on  UH-60F  Helicopter  (from 
System  Users  Manual  for  IMD-HUMS) 


The  MPU  is  located  in  the  aircraft's  transition  section  avionics  bay.  It  is  the  brain 
of  the  OBS  portion  of  the  IMD-HUMS.  The  MPU  receives  data  from  the  accelerometers 
and  performs  the  following  tasks:  conversion  of  analog  data  into  digital  data;  recognition 
of  flight  regime  and  detennination  of  regime  duration;  conversion  of  data  into  Cl; 
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recognition  of  vibration  exceedances;  and  the  storage  of  data  for  transfer  to  the  DTU. 
After  the  data  has  been  processed  by  the  MPU,  the  resulting  outputs  are  referred  to  as 
acquisitions  (IMD-HUMS  User  Manual,  2005).  This  data  is  in  raw  data  file  (rdf)  format. 

The  Ground  Station  System  consists  of  all  the  software  and  hardware  associated 
with  the  analysis  of  the  acquisitions  not  located  on  the  helicopter.  Once  the  acquisitions 
are  downloaded  from  a  DTMU,  this  data  and  all  other  data  from  all  flights  of  all  aircraft 
using  the  IMD-HUMS  are  available  for  analysis.  The  GSS  will  automatically  generate 
some  of  the  required  maintenance  actions  resulting  from  an  IMD-HUMS  equipped 
aircraft's  flight  (IMD-HUMS  User  Manual,  2005). 


C.  UH-60L  HELICOPTER  AND  ELECTRICAL  GENERATORS 

The  UH-60L  (Blackhawk)  (Figure  3)  is  a  twin  turbine  engine,  single  rotor,  semi- 
monocoque  fuselage  helicopter.  The  primary  mission  capability  of  the  helicopter  is 
tactical  transport  of  troops,  supplies  and  equipment.  Secondary  missions  include  training, 
mobilization,  development  of  new  and  improved  concepts,  and  support  of  disaster  relief. 
The  US  Army  alone  has  over  1,900  H-60  helicopters  in  its  inventory  (International 
Institute  for  Strategic  Studies,  2005).  The  incorporation  of  IMD-HUMS  into  the  H-60 
fleet  is  a  major  financial  investment  with  great  implications  concerning  the  maintenance 
practices  of  these  helicopters. 
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Figure  3.  UFI-60L  Blackhawk  Helicopter  (from  Operators  Manual  for  UH-60L 
Helicopter) 


There  are  two  electrical  generators  in  each  UH-60L  helicopter  (Figure  4).  They 
are  mounted  on  and  driven  by  the  transmission  accessory  gear  box  module.  Each  is 
capable  of  supplying  the  total  helicopter  power  requirements  (Operators  Manual  for  UH- 
60L  Helicopter,  2003).  Main  components  associated  with  the  electrical  generators  are  as 
follows:  the  spur  gear  located  in  the  accessory  transmission  model  which  transfers  the 
rotational  power  to  rotate  the  generator  shaft,  the  bearings  which  support  and  stabilize  the 
generator  shaft,  and  the  generator  shaft  itself  which  rotates  along  with  mounted  brushes 
to  produce  electricity. 
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Figure  4.  UH-60L  Generator  (after  Intermediate  Maintenance  Repair  and  Special 
Tools  List  for  UH-60L) 


D.  PHYSICS  OF  VIBRATIONS  AND  EXPLANATION  OF  TERMS 

This  section  provides  an  overview  of  the  basic  physical  concepts,  terms  and  tools 
used  in  CBM  and  specifically  the  IMD-HUMS.  These  concepts  are  used  to  describe  the 
important  Cl  computed  by  IMD-HUMS  and  used  in  this  thesis.  Also  explained  is  how 
these  Cl  are  used  to  assess  the  health  of  a  component. 
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1.  IMD-HUMS  and  Mechanical  Vibrations 

An  oscillation  is  the  variation,  usually  with  time,  of  the  magnitude  of  a  quantity 
with  respect  to  a  specified  reference  when  the  magnitude  is  alternately  greater  and 
smaller  than  the  reference  (Harris,  2002).  A  vibration  is  an  oscillation  where  the  varying 
quantity  is  the  parameter  that  defines  the  motion  of  mechanical  system  (Harris,  2002).  It 
is  the  vibrations  from  operating  components  which  the  IMD-HUMS  acquire  for  analysis. 
A  rotating  high-speed  engine  shaft,  a  main  transmission  gear  turning,  and  a  main  tail 
rotor  blade  rotating,  twisting,  and  flapping  in  multiple  directions  all  produce  some  type  of 
vibration.  The  gears,  shafts  and  bearings  of  the  UH-60L  generators,  which  are  the 
components  chosen  for  this  study,  also  produce  vibrations  when  in  operation. 

The  IMD-HUMS  uses  accelerometers,  also  known  as  pezio-electric  transducers, 
to  measure  mechanical  vibrations.  Specifically  they  measure  changes  in  the  rate  of  speed 
of  displacement,  or  acceleration,  of  a  component  in  a  particular  direction. 
Accelerometers  convert  physical  acceleration  into  analog  electrical  voltages.  These 
accelerations  oscillate  over  time  hence  the  resulting  motion  is  a  vibration  (Collacott, 
1979). 

The  peak-to-peak  (P2P)  value  of  a  vibrating  quantity  is  the  algebraic  difference 
between  the  extremes  of  the  quantity  (Harris,  2002).  The  IMD-HUMS  considers  the 
peak-to-peak  value  of  vibrations  because  this  value  tends  to  increase  when  vibrating 
components  begin  to  fail. 

The  tenn  envelope  (Env)  refers  to  the  fact  that  the  background  signals  are 
removed  from  a  vibration  leaving  only  the  portion  of  the  vibration  which  is  to  be  focused 
upon  or  analyzed  (Harris,  2002).  The  IMD-HUMS  will  extract  the  envelope  signal  for 
some  of  its  outputs. 

Probability  Density  Function  (pdf)  and  kurtosis  are  statistical  concepts  applied  to 
vibration  analysis.  All  vibrations  have  a  characteristic  pdf  which  characterizes  the 
probability  of  a  specific  instantaneous  vibration  occurring.  Vibrations  of  good  operating 
components  usually  have  pdfs  with  a  bell-shaped  curve.  Deviations  from  the  bell-shaped 
curve  can  be  used  to  indicate  failing  or  degrading  components.  The  fourth  moment,  or 


17 


kurtosis,  of  the  curve  is  best  suited  to  capture  these  deviations.  This  approach  has  been 
particularly  useful  in  the  vibration  analysis  of  bearings  (Rao,  2004). 

The  term  meshing  is  used  to  define  the  working  contact  or  the  fitting  together  and 
interactions  of  gears.  Meshing  of  gears  results  in  vibrations  which  the  IMD-HUMS 
measures. 

2.  Condition  Indicators  and  Health  Indicators 

Condition  Indicator/Indication(s)  (Cl)  and  Health  Indicator/Indication(s)  (HI)  are 
terms  developed  by  Goodrich  Corporation.  The  Cl  are  variables  computed  by  IMD- 
HUMS  from  the  raw  vibratory  data.  They  are  used  as  a  measure  of  a  component’s  state  at 
the  time  of  acquisition.  There  are  several  types  of  CL  The  important  Cl  used  in  this 
study  are  described  in  the  following  paragraphs.  Up  to  eight  different  Cl  are  used  by  the 
IMD-HUMS  to  calculate  a  value  which  summarizes  a  component's  overall  state,  known 
as  a  HI.  For  each  specific  component  there  is  a  proprietary  algorithm  developed  by 
Goodrich  Corporation  which  detennines  exactly  how  its  HI  is  computed.  HI  are  scaled  to 
have  values  between  0  and  1.  During  the  time  period  in  which  data  is  collected,  a  HI 
value  between  0.0  and  0.32  is  normal  (operating  fine),  between  0.33  and  0.66  is  called  a 
warning,  and  between  0.67  and  1.0  is  called  an  alarm  (software  changes  subsequent  to  the 
data  collection  period  have  resulted  in  changes  to  the  HI  scale). 

Shaft  Order  1  (SOI)  is  a  measurement  used  to  detect  dynamic  imbalances  and 
shaft  misalignment  with  supporting  structures  (usually  bearings)  of  a  shaft.  It  has 
dimensions  of  distance  per  unit  time,  measured  in  IPS  (inches  per  second).  A  single 
oscillation  in  the  resulting  vibration  occurs  (order  1)  for  each  complete  shaft  revolution 
when  an  imbalance  and/or  misalignment  exists.  These  imbalances  and  misalignments  are 
a  result  of  wearing  and  degrading  shafts  and  bearings  (Harris,  2002). 

Shaft  Order  2  (S02),  like  SOI,  is  a  measure  used  in  detecting  shaft  misalignment 
with  supporting  structures  in  a  shaft.  It  has  dimensions  of  distance  per  unit  time, 
measured  in  IPS.  Two  oscillations  in  the  resulting  vibration  (order  2)  for  each  complete 
shaft  revolution  results  when  a  misalignment  exists  (Harris,  2002). 
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Residual  Peak-to-Peak  (Res_P2P)  is  a  measurement  of  displacement  (distance 
dimension)  in  a  vibration.  The  term  "residual"  speaks  to  the  fact  that  the  strong  tones  are 
first  removed  from  the  vibration  leaving  only  the  portion  of  the  peak-to-peak 
displacement  which  results  from  regularly  existing  background  vibration  (Harris  2002, 
P3I  VPU/DTD  Software  Requirements  Specifications,  2001). 

The  ball  energy  measurement  results  from  defects  of  a  spinning  ball  bearing.  This 
measurement  is  used  to  detect  defects  or  wear  in  the  bearings  and  is  in  the  dimensions  of 
force,  distance  and  time  (Rao,  2004). 

Envelope  peak-to-peak  (Env.P2P)  is  a  measure  of  the  periodic  impulses  due  to 
bearing  defects.  Background  signals  within  the  vibration  are  first  removed  from  the 
vibration,  leaving  only  the  portion  of  the  vibration  which  best  depicts  the  bearing  defect. 
Envelope  peak-to-peak  is  in  the  dimension  of  distance  (Rao,  2004). 

Envelope  Kurtosis  (Env.Kurtosis)  is  a  measurement  of  how  the  periodic  impulses 
due  to  bearing  defects  affect  the  curve  of  the  pdf  of  the  bearings'  total  vibration.  Kurtosis 
measures  the  thickness  of  the  tails  of  the  distribution  of  bearing  vibrations  after  the 
background  signals  have  been  removed  (Harris,  2002). 

Envelope  Distributed  Fault  (Env.DF)  is  a  dimensionless  ratio  of  the  standard 
deviations  of  the  envelope  data  (data  after  background  signals  are  removed)  and  all  raw 
data  (the  total  vibration).  This  measurement  is  used  in  the  analysis  of  bearing  defects. 
The  term  "distributed"  refers  to  the  fact  that  all  possible  directions  of  displacement  are 
considered  in  this  measurement  (Harris,  2002). 

Gear  Distributed  Fault  (GDF)  is  a  dimensionless  measurement  resulting  from  the 
ratio  of  unexplained  and  explainable  variances  of  a  vibration  resulting  from  the  meshing 
of  gears.  It  is  believed  that  this  measurement  is  an  indication  of  gear  teeth  wear  and 
cracks  (Harris,  2002). 
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The  G2-1  measurement  is  a  result  of  an  algorithm  which  considers  the  average 
peak-to-peak  and  energy  output  of  a  vibration  resulting  from  the  meshing  of  gears.  It  is 
used  in  the  analysis  of  gears.  The  term  was  developed  by  Goodrich  Corporation  and  the 
algorithm  which  detennines  its  value  is  proprietary  (PI  VPU/DTD  Software 
Requirements  Specifications,  2001). 

Gear  Misalignment  1  (GearMis  l)  is  a  dimensionless  measurement  resulting  from 
the  ratio  of  the  energies  of  the  vibrations  produced  when  gears  mesh  (Harris,  2002). 
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III.  DATA  ANALYSIS 


This  section  details  the  process  of  data  analysis  for  this  thesis.  It  begins  with  an 
explanation  and  description  of  the  data  and  how  the  data  are  partitioned  into  a  training 
data  set  and  an  experimental  data  set.  Next,  for  the  training  data  a  brief  graphical 
exploration  of  the  differences  between  good  and  bad  generators  as  well  as  differences 
among  bad  generators  is  given.  The  remainder  of  the  chapter  deals  with  variable 
selection  and  the  fitting  of  the  logistic  regression  and  forest  of  trees  models. 


A.  DATA 

1.  Data  Collection 

The  authors  first  visited  the  US  Anny  unit  conducting  the  operational  test  of  the 
IMD-HUMS.  The  soldiers  of  this  unit,  4th  Regiment  101st  AVN  Division,  are  the 
operators  and  maintainers  of  the  30  UH-60L  helicopters  which  have  IMD-HUMS 
installed.  During  the  ten-day  visit  the  components  and  concept  of  operations  of  the 
system  were  explained,  the  operation  of  the  system  was  witnessed,  and  the  IMD-HUMS 
data  output  was  shown.  The  authors  were  permitted  to  fly  aboard  one  of  the  helicopters 
during  which  time  the  data  collection  process  from  beginning  to  end  was  demonstrated 
and  explained  in  detail.  The  soldiers  then  explained  the  unit-level  data  analysis  and 
maintenance  practices  which  result  from  these  data  collections.  They  also  provided 
several  specific  cases  of  successful  implementation  of  the  system  and  cases  of  interest  for 
possible  study.  Of  particular  interest  were  six  electrical  generators  which  have  been 
replaced  for  cause.  The  IMD-HUMS  data  concerning  these  replaced  generators  provided 
an  opportunity  to  determine  whether  the  data  can  predict  the  cause  and/or  need  for 
generator  replacement. 

In  the  two  years  of  IMD-HUMS  use  in  the  30  UH-60L  helicopters,  data  has  been 
collected  on  66  different  electrical  generators.  In  these  two  years  six  generators  were 
removed  from  operations  for  some  reason  of  fault;  the  remaining  60  generators  were 
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deemed  to  be  working  properly.  Where  available,  pre-fault  data,  maintenance  records 
and  photographs  were  used  to  explain  the  circumstances  of  the  faulted  generators. 
Photographs  show  that  some  of  the  faulted  generators  had  worn  or  totally  broken 
components.  In  addition,  the  maintenance  history  of  each  of  these  faulted  generators  was 
investigated.  Table  1  provides  a  summary  of  the  cases  of  each  of  these  six  faulted 
generators.  The  failure  of  two  of  the  generators,  numbers  9  and  33,  were  detected  during 
operation  by  a  generator  warning  light.  Faults  in  the  remaining  four  generators,  numbers 
22,  31,  53  and  56,  did  not  trigger  the  generator  warning  light.  However  each  of  the  four 
generators  had  unusually  high  SOI  readings  upon  removal.  Three  of  these  generators, 
numbers  22,  31  and  53,  showed  evidence  of  fault  or  wear.  The  removal  of  generator 
number  56  resulted  from  the  case  of  an  identifiable  buzz  explained  earlier  in  Chapter  II. 
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Table  1.  Confirmed  Bad  Generators 


Confirmed  Bad  Generators 

Aircraft  / 
Side 

Generator 

# 

Reported  Comments 

450  Left 

9 

#  1  generator  failed  during  shutdown  upon  APU 
generator  coming  on  during  start;  replaced  #1 
generator  1 

829  Left 

33 

#  1  generator  bad;  replaced  generator  1 

518  Left 

22 

SOI  near  2  IPS  so  replaced  Spline  Adapter  Coupler 

during  next  scheduled  maintenance.  Evidence  of 

wear  and  possible  improper  installation.  SOI 

2 

returned  to  .05  IPS  after  replacement. 

549  Left 

31 

Replaced  Spline  Adapter  Coupler  due  to  SOI  at  3 

IPS.  Adapter  severly  worn  and  two  1  inch  cracks 

found.  IPS  still  high  after  Adapter  replacement  so 

2 

generator  also  replaced. 

515  Right 

53 

While  getting  modified  with  IMD-HUMS,  vibration 

was  noted,  found  to  have  SOI  at  3  IPS.  Adapter 

Coupler  was  replaced  (had  some  wear)  and  SOI 

2 

vibrations  dropped  below  .05  IPS. 

518  Left 

56 

3  Jan  04  Mosul:  had  a  weird  buzz  on  left-hand  side 

ceiling,  isolated  to  generator  (found  to  have  SOI 

2 

over  4  IPS).  Generator  and  coupling  replaced. 

Source: 

1  Maintenance  Records 

2  Johnny  Wright  and  Ground  Station  Team,  IMD-HUMS  Fault  Detections, 

Goodrich  Corporation.  Draft  5/25/2005  (Ver  117) 
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2.  Data  Description 

More  than  60GB  of  data,  which  consists  of  acquisition  data  for  all  the  helicopters' 
monitored  components,  was  sent  to  NPS  in  rdf  format.  The  data  readings  concerning  all 
the  generator  shaft,  spur  gear  and  bearing  were  then  extracted  and  converted  to  column 
separated  value  (csv)  format  for  data  exploration  and  analysis.  Each  IMD-HUMS 
acquisition  concerning  the  shaft,  spur  gear  and  bearings  of  a  generator  results  in  169 
variables.  The  169  variables  are  listed  in  Appendix  A. 


RESPONSE 


POTENTIAL 

PREDICTORS 


X,  X 2 


X 


169 


36,743 


Figure  5.  Example  of  the  IMD-HUMS  Data  in  CSV  Format 


Each  generator  is  assigned  a  number,  1  through  66,  for  ease  of  identification  and 
data  manipulation.  These  numbers  were  then  incorporated  into  the  data  set.  Among  the 
169  variables  recorded  for  each  acquisition  are  the  Health  Indicators  for  the  gear,  bearing 
and  shaft.  Some  generators  had  acquisitions  which  numbered  in  the  tens,  others  in  the 
hundreds,  and  others  in  the  thousands.  In  total,  for  all  66  generators,  there  are  36,743 
separate  data  acquisitions  from  the  two-year  period  during  which  the  IMD-HUMS  were 
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installed.  Two  generators,  numbers  23  and  34,  were  removed  because  these  two 
generators  have  less  than  20  acquisitions  during  their  time  of  operation,  leaving  data  from 
64  generators. 

The  data  set  is  divided  into  two  separate  groups;  the  "training"  set  to  be  used  to 
develop  models  which  predict  whether  a  generator  is  faulty  or  not  based  on  Cl,  and  an 
"experimental"  set  used  to  test  how  well  these  models  actually  predict  whether  a 
generator  is  faulty  or  not. 

3.  Training  Data  Set 

Each  generator  in  the  training  set  is  assigned  a  binary  value  of  1  or  0  to  classify 
their  known  state.  The  value  of  one  is  given  to  the  generators  removed  for  fault, 
henceforth  referred  to  as  bad  generators.  The  value  of  zero  is  given  to  the  generators  not 
removed,  referred  to  as  good  generators. 

A  complication  of  this  binary  classification  system  is  that  there  may  be  bad 
generators,  ones  which  will  eventually  fail,  classified  as  good  because  their  faulty 
condition  has  not  yet  been  identified.  The  large  number  of  good  generators  included  in 
the  training  set  serves  as  protection  from  these  errors,  diminishing  the  influence  of  any 
incorrectly  classified  generators.  This  is  a  critical  assumption  in  the  analysis.  The  fact 
that  each  generator  is  assigned  a  state  of  0  (good)  or  1  (bad)  does  not  mean  these 
generators  are  actually  in  the  assigned  state.  The  assigned  state  of  0  (good)  or  1  (bad)  is 
based  strictly  upon  whether  a  generator  was  removed  for  fault  or  not.  A  generator  with 
an  undetected  fault  would  be  assigned  a  state  of  0  (good).  Likewise  a  generator  which 
was  replaced  for  a  reason  of  fault  and  assigned  a  state  of  1  (bad)  could  actually  have  been 
mechanically  good;  perhaps  the  electrical  contacts  or  wiring  could  have  had  a  short- 
circuit.  This  is  the  reason  the  authors  investigated  the  circumstances  and  maintenance 
actions  of  each  of  the  replaced  generators. 
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The  training  set  consists  of  data  from  52  of  the  64  generators.  Five  of  the  training 
set  generators  had  been  taken  off  their  helicopters  for  fault  and  are  classified  as  bad;  the 
remaining  47  training  set  generators  had  no  known  faults  throughout  their  data  history 
and  are  classified  as  good. 

Only  Cl  computed  in  the  last  20  acquisitions  of  each  generator  of  the  training  set 
were  used  in  the  development  of  the  prediction  models.  This  is  because  the  faulted 
generators,  classified  as  bad,  most  likely  were  not  bad  throughout  their  entire  two-year 
history.  By  restricting  analysis  to  the  last  20  acquisitions  the  risk  of  including 
observations  from  bad  generators  gathered  before  the  fault  occurred  is  reduced.  The 
choice  of  20  acquisitions  is  a  judgment  call  made  by  the  authors  after  inspecting  the 
general  trend  of  Cl  and  HI.  This  reduced  the  training  set  to  a  total  of  1040  acquisitions. 

4.  Experimental  Data  Set 

The  experimental  set  consists  of  data  from  the  remaining  12  generators.  One 
generator,  number  33,  was  taken  off  its  helicopter  for  fault  and  the  remaining  11 
generators  in  the  experimental  set  worked  properly  throughout  their  data  history. 
However,  six  of  these  1 1  generators  were  put  on  what  the  users  called  the  "watch  list,” 
the  list  of  generators  with  questionable  status  (Table  2).  The  watch  list  consists  of 
generators  which  show  generator  shaft  Cl  or  HI  values  which  indicate  that  perhaps  these 
generators  are  beginning  to  degrade.  Two  of  the  generators,  numbers  30  and  21,  are 
considered  to  be  in  a  priority  status  due  to  shaft  order  1  (SOI)  readings  above  2.0  IPS. 
The  other  four  watch-list  generators  have  SOI  readings  above  1.5  IPS.  These  generators 
are  included  in  the  experimental  set  to  make  a  final  detennination  of  their  status  using  the 
prediction  model. 

The  five  remaining  good  generators  in  the  experimental  data  were  on  the  opposite 
side  of  the  four  watch  list  generators  and  the  one  faulty  generator. 
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Table  2.  Generator  Watch  List 


Generator  Watch  List 

Aircraft  / 
Side 

Generator 

# 

Shaft  Order  1  Vibrations  (IPS) 

As  of  5/27/2005  RTH  (Rotor  Turn  Hours) 

545  Left 

30 

reached  2.3  IPS  and  is  increasing  at  2.2  IPS  per  100  RTH 

516  Left 

21 

reached  2.5  IPS  and  is  increasing  at  0.5  IPS  per  100  RTH 

441  Left 

6 

reached  1.5  IPS  and  increasing  at  0.4  IPS  per  100  RTH 

516  Right 

55 

reached  1.78  IPS  and  is  increasing  at  0.1  IPS  per  100  RTH 

493  Right 

48 

reached  1.85  IPS  and  is  not  increasing 

519  Left 

24 

reached  1.55  IPS  and  is  increasing  at  less  than  0.1  IPS  per 
100  RTH 

Source:  Harrison  Chin,  Dave  Green,  Eric  Mayhew,  Johnny  Wright,  Generator  Shaft 
Analysis:  Expanded  Survey  Including  #441,  #515,  #516,  #518,  #519  and  #545, 

Goodrich  Corporation,  Draft  5/27/2005  (Ver  3) 

B.  GRAPHICAL  ANALYSIS 

Projection  Pursuit  (Hastie,  Tibshirani  &  Friedman,  2001)  implemented  by  the 
statistical  software  Ggobi,  is  used  to  gain  a  visual  perspective  of  the  relationship  among 
the  variables.  Ggobi  plots  two-dimensional  projections  of  multi-dimensional  data.  The 
projection  pursuit  algorithm  numerically  searches  for  two-dimensional  projections  which 
maximize  one  of  several  possible  measures  of  interest.  These  projections  are  displayed 
graphically  and  the  plot  is  continually  updated  as  the  algorithm  pursues  “optimal” 
projections.  The  display  is  interactive,  and  Ggobi  allows  the  user  to  stop  the  display  and 
manually  change  the  projection  at  any  point.  By  using  projection  pursuit  several  insights 
are  gained  concerning  the  data. 

Projection  Pursuit  is  first  used  to  study  the  five  bad  generators  from  the  training 
set.  Five  variables,  the  Cl:  SOI,  S02,  Env.P2P,  GearMis  l,  and  Ball  Energy,  from  the 
169  variables  relating  to  generators  are  used  as  input  variables.  The  SOI  and  S02 
variables  are  accepted  common  indications  for  shaft  conditional  state.  The  remaining 
three  variables  are  used  to  address  the  conditional  state  of  the  gears  and  bearings. 
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Figure  6.  Ggobi  Dispaly  -  Clusters  of  Bad  Generators 


Figure  6  shows  the  Ggobi  graphical  display  of  the  projection  of  these  five 
variables  for  the  five  bad  generators  of  the  training  set.  The  figure  shows  that  four  of  the 
five  bad  generators  form  single  clusters.  Only  one  of  the  generators,  number  53,  forms 
two  clusters,  one  in  the  upper  right  of  the  display,  the  other  in  the  lower  left  of  the 
display.  From  this  display,  one  might  be  tempted  to  propose  that  the  two  clusters 
represent  two  different  time  periods.  However,  this  is  not  the  case. 
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Let  yx  and  y2  be  the  linear  functions  of  the  five  Cl  computed  by  Ggobi  in  Figure 


6.  Further,  consistent  with  Figure  6,  let 

x4  =  SOI 
x5  =  S02 
x6  —  Env.P2P 
x7  -  GearMis_l 
x8  =  Ball  Energy. 

Then  _v,  and  y2  can  be  computed  as 
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Plotting  yx  and  y2  in  acquisition  time  sequence,  Figure  7  clearly  shows  that  the  y]  and 
y2  values  of  generator  number  53  oscillates  between  the  two  groups  depicted  in  Figure  6 
over  time. 


Ggobi  yl  and  y2 


Figure  7.  Variability  of  Generator  Number  53 


29 


Ggobi  is  also  used  to  investigate  clustering  of  generators  classified  as  good  and 
bad.  The  five  same  Cl  are  again  used  as  input  variables.  The  resulting  display,  Figure  8, 
shows  a  definite  difference  in  grouping  of  variables  between  most  of  the  good  generators 
(light  grey  dots  if  viewed  in  the  non-color  copy  or  yellow  dots  if  viewed  in  the  color 
copy)  and  the  bad  generators  (dark  grey  dots  if  viewed  in  the  non-color  copy  or  purple  if 
viewed  in  the  color  copy).  Flowever,  one  bad  generator,  number  9,  seems  to  be  clustered 
with  the  good  generators. 


Figure  8.  Ggobi  Display  -  Light  gray  (yellow)  dots  are  good  generators,  dark  gray 
(purple)  dots  are  bad  generators 
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C.  INITIAL  VARIABLE  SELECTION  PROCESS 

With  169  variables  initially  in  the  training  data  set,  we  reduced  the  number  of 
potential  predictors  based  upon  an  understanding  of  the  IMD-HUMS,  the  physical 
operation  of  the  helicopter  and  generators,  and  the  vibrations  they  produce.  This  variable 
reduction  is  necessary  when  fitting  parametric  models  such  as  logistic  regression.  It  is 
also  desirable  but  not  strictly  required  when  using  certain  data  mining  techniques.  The 
169  variables  include  the  HI  for  the  shaft,  gear  and  bearing.  The  current  practice  is  to 
rely  heavily  upon  the  HI  of  the  generator  shaft  to  assess  the  overall  health  of  the 
generator.  A  single  model  incorporating  acquisitions  from  all  three  components, 
however,  might  better  detect  other  modes  of  fault  or  degradation. 

The  classification  models  are  based  on  variables  that  describe  the  state  of  the 
three  main  components  involved  in  the  operation  of  the  generator:  the  generator's  shaft, 
supporting  bearings  and  supporting  spur  gear.  The  authors  believe  doing  so  explains  the 
overall  state  of  the  generator  better  than  separately  tracking  and  assessing  each 
component's  HI. 

The  first  step  in  variable  reduction  is  to  eliminate  any  variables  which  do  not 
originate  from,  or  directly  address,  one  of  these  three  components  and  their  physics  of 
operation.  For  example,  consider  torque  (a  measure  of  power  output)  readings  of  each 
engine  at  the  time  of  acquisition.  Once  up  and  running,  the  electrical  generators  turn  at  a 
nearly  constant  speed,  under  a  nearly  constant  force,  regardless  of  engine  torque.  The 
transient  run-up  time  to  generator  rotational  speed  is  minimal.  Therefore  the  engine 
torque  readings  are  eliminated  as  possible  predictors.  Explained  another  way,  changes  in 
engine  torque  are  not  expected  to  result  in  significant  changes  of  generator  speed  or 
forces.  The  same  reasoning  is  applied  to  eliminate  other  variables.  For  example, 
acquisition  date/time,  aircraft  tail  number,  airspeed,  main  rotor  speed,  outside  air 
temperature,  main  gear  box  temperature,  and  flight  regime  are  all  eliminated. 

It  seems  this  reasoning  should  also  be  applied  in  determining  whether  position  of 
the  generator  (left  or  right  side  of  the  helicopter)  should  be  included  as  a  variable.  The 
left  and  right  generators  are  identical  and  interchangeable  in  all  physical  aspects.  The 

only  distinction  between  them  is  their  name  "left"  or  "right"  given  by  the  side  of  the 
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helicopter  they  are  installed  on.  However,  graphical  analysis  of  certain  Cl,  particularly 
Residual  Peak-to-Peak,  show  clear  differences  in  both  mean  and  variance  of  these  values 
between  left  and  right  generators.  (Figure  9). 


Generator  Number 


Figure  9.  Dot  Plot  of  the  Residual  Peak-to-Peak  Cl  for  Each  Generator 


The  differences  may  be  caused  by  slight  variations  in  the  way  that  complicated 
vibrations  are  transmitted  from  the  accelerometer  to  the  MPU.  While  the  variables 
indicating  left  or  right  side  of  aircraft  is  not  explicitly  included  in  the  analysis,  the 
left/right  position  is  implicitly  captured  with  variables  such  as  residual  peak-to-peak  and 
envelope  distributed  fault. 
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Another  method  of  eliminating  variables  is  to  drop  any  redundant  or  nearly 
redundant  variables.  For  instance,  shaft  orders  one,  two,  three  and  one-half  are  all 
calculated  in  three  different  scales:  IPS,  OBS,  and  G  forces.  Each  is  a  constant  multiple 
of  the  other.  Thus  the  shaft  order  readings  in  the  scale  of  IPS  were  kept  while  the  others 
were  dropped.  Normalized  versions  of  the  variables  ball  energy,  cage  energy,  inner  race 
energy,  outer  race  energy  and  total  bearing  energy  were  also  dropped  since  non- 
nonnalized  readings  for  each  of  these  exist  in  the  data  set. 

By  eliminating  redundant  variables  and  those  not  directly  involved  with  the 
generator  shaft,  gear  and  bearing,  the  169  variables  were  reduced  to  65  variables. 
Appendix  A  is  a  listing  of  all  169  variables  with  the  65  remaining  variables  highlighted. 

However,  redundancies  still  exist  among  the  remaining  variables.  For  example, 
computation  of  the  sample  correlations  between  the  65  predictor  variables  gives  16  pairs 
of  variables  with  sample  correlations  greater  than  90%.  These  high  correlations  are  an 
indication  of  multicollinearity  among  the  predictors.  In  addition,  the  principle 
components  of  the  standardized  variables  (Hastie,  Tibshirani  &  Friedman,  2001)  indicate 
that  the  first  10  principle  components  account  for  68%  of  the  variability  of  the  65 
variables  (Figure  10).  Over  95%  of  the  variability  can  be  captured  with  34  components. 
This  confirms  our  suspicion  that  generator  condition  can  be  captured  in  fewer  dimensions 
than  the  current  data  set.  Figure  10  shows  the  percentage  of  variance  captured  in  the  first 
ten  principle  components  of  the  65  standardized  predictor  variables. 
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Variances 


Relative  Importance  of  Principal  Components 


0.173 


Comp.1  Comp. 2  Comp. 3  Comp. 4  Comp. 5  Comp. 6  Comp. 7  Comp. 8  Comp.9Comp.10 

Figure  10.  Variance  Captured  in  First  Ten  Components  of  Data  Set  Containing  65 
Predictor  Variables 
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Table  3.  Pairwise  Sample  Correlations 


Cl 

CORRELATION 

CLASSIFICATION 

SHAFT  ORDER  2 

0.690 

STRONG 

SHAFT  ORDER  1 

0.654 

MEDIUM 

SHAFT  ORDER  3 

0.491 

MEDIUM 

ENVELOPE  PEAK  TO  PEAK 

0.376 

MEDIUM 

GEAR  DISTRIBUTED  FAULT 

-0.305 

WEAK 

BASE  ENERGY 

0.275 

WEAK 

BALL  ENERGY 

0.252 

WEAK 

BEARING  ENERGY 

0.197 

WEAK 

Pairwise  sample  correlations  of  the  65  predictors  with  the  binary  response 
variable  indicating  good  or  bad  are  also  computed.  Of  these,  the  variables  with  the 
highest  correlation  are  given  in  Table  3.  Correlations  from  0.0  to  0.33  are  classified  as 
weak,  correlations  from  0.34  to  .66  are  classified  as  medium,  and  correlations  from  0.67 
to  1 .00  are  classified  as  strong. 

This  analysis  of  correlation  provides  indications  of  useful  variables  for  the 
models.  They  are  shaft  orders  1,  2  and  3  which  result  from  vibrations  of  the  generator 
shaft,  and  in  the  case  of  shaft  order  1  from  the  bearings  also.  Shaft  order  2  has  the 
strongest  correlation  with  the  response  variable  of  all  65  predictors.  Shaft  orders  1  and  3 
have  the  next  highest  correlations,  classified  as  medium.  Envelope  Peak-to-Peak  which 
result  from  vibrations  of  the  bearings  shows  the  next  highest  correlation,  also  classified  as 
medium.  The  remaining  predicators  have  differing  measures  of  weak  correlation  with 
respect  to  the  response. 
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D.  LOGISTIC  REGRESSION  MODEL 


Let  n  =  1,040  be  the  total  number  of  observations  in  the  training  data  set;  and  let 
Yj,  i  =  1,2,3,...,/? ,  represent  the  binary  random  variable  indicating  whether  the  i'h 
observation  comes  from  a  bad  generator  (f^=l)  or  a  good  generator  (1^  =  0).  The 
logistic  regression  model  assumes  that  F  are  independent  Bernoulli  variables  with 
ni  -P(Yj  =l)  for  i  =  1,2,3 ...,n  .  In  addition  the  logistic  regression  model  "links"  7Ti  to 

the  observed  values  of  the  k  predictors  for  the  i,h  observation  xn,xi2,xi3,...,xik  as 
follows: 


In 


Jt; 


=  p0  +  M,  + ...  +  pkxik  i  =  1, 2,3, ..., n  . 


where  fi{),  fik  are  the  k  + 1  parameters  or  coefficients  to  be  estimated. 


The  benefit  of  using  logistic  regression  in  the  model  is  it  can  be  used  to  estimate 
7l ,  the  probability  that  the  observation  comes  from  a  bad  generator  rather  than  a  good 
generator. 

There  is  one  assumption  for  logistic  regression  which  our  application  of  the  model 
violates  heavily.  Logistic  regression  requires  that  the  F  be  independent  of  one  another. 

Time-series  collections,  and  the  method  of  classifying  an  entire  generator,  not  each 
individual  acquisition,  as  good  or  bad  create  an  unusual  dependency  between  acquisitions 
within  each  generator.  To  fit  the  models,  the  last  20  acquisitions  from  each  generator  in 
the  training  set  are  used,  thus  violating  independent  sampling.  For  instance,  a  single 
worn  or  damaged  ball  bearing  wears  more  and  more  with  continued  operation.  Further 
acquisitions  depicting  more  wear  and  damage  will  result.  Therefore  the  state  of  a 
component  is  dependent  upon  its  past  state.  However,  here  logistic  regression  is  used  to 
compute  summary  statistics  rather  than  for  inference.  Thus  the  real  proof  of  the  utility  of 
using  this  approach  lies  in  how  well  it  predicts  problems  in  the  generators  in  both  the 
training  and  experimental  data  sets. 
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Two  approaches  are  used  to  fit  the  logistic  regression  model.  The  first  method 
creates  a  compact  model  which  has  no  correlation  or  left/right  generator  issues;  however, 
the  second  method  is  chosen  for  the  final  model  due  to  better  performance. 

The  first  method  forces  inclusion  of  shaft,  gear,  and  bearing  CI.  Three  logistic 
regression  models  are  fit:  one  with  CI  originating  only  from  the  bearings,  another  with  CI 
originating  only  from  the  gear,  and  still  another  with  CI  originating  only  from  the  shaft. 
Backwards  elimination  is  used  to  select  variables  for  each  of  these  models,  i.e.  the  CI 
with  the  greatest  p-value  is  eliminated  from  the  model  at  each  step  of  the  backwards 
elimination  procedure.  The  end  result  is  20  variables  for  the  bearings,  14  variables  for 
the  gear  and  5  variables  for  the  shaft.  The  purpose  of  fitting  separate  models  based  on  CI 
from  the  three  separate  components  is  to  ensure  that  potential  predictors  for  each 
component  are  included  in  the  final  model.  These  three  sets  of  variables  are  then 
combined,  and  another  logistic  regression  model  is  fit  using  backwards  elimination  for 
variable  selection.  With  each  logistic  regression  printout  Null  Deviance  (ND)  minus 
Residual  Deviance  (RD)  is  considered.  In  logistic  regression  fits  where  all  modeling 
assumptions  such  as  independence  apply,  a  small  RD  is  desired  but  not  at  the  expense  of 
an  over-fit  model.  Including  all  or  too  many  of  the  potential  variables  would  result  in 
over-fitting;  the  resulting  model  would  predict  the  training  data  set  very  well  but  would 
include  unnecessary  variables  and  may  not  be  usable  for  predictions  on  other  data. 

This  process  gives  a  model  with  only  five  predictors:  SOI,  S02,  GearMis  l,  Ball 
Energy,  and  Env.P2P.  These  CI  have  low  pairwise  correlation  and  the  variable  indicating 
left/right  generator  is  not  needed,  but  the  performance  compared  to  the  final  model  is 
inferior  (Table  4). 
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Table  4.  Comparison  of  Logit  Model  Performance 


Model 

Logit  Model  Fitting  Criteria 

Over  Fit 

65  variables 

Null  Deviance:  658.4219  on  1039  degrees  of  freedom 

Residual  Deviance:  0  on  974  degrees  of  freedom 

Under  Fit 

5  variables 

Null  Deviance:  658.4219  on  1039  degrees  of  freedom 

Residual  Deviance:  213.6768  on  1034  degrees  of  freedom 

Final  Model 

10  variables 
(fitted  with 
generator  #  7 
classified  as  bad, 
explained  in 

Results  Chapter) 

Null  Deviance:  743.8645  on  1039  degrees  of  freedom 

Residual  Deviance:  77.71993  on  1029  degrees  of  freedom 

Likelihood 

Ratio  Test 

Chi- 

Square 

degrees  of 
freedom 

Significance 

666 . 145 

10 

.  000 

The  second  logistic  regression  model  was  fit  by  the  following  process.  We  begin 
with  the  65  variables  detennined  after  initial  variable  elimination.  Further  elimination  of 
redundant  or  similar  variables  led  to  the  removal  of  16  bearing  and  2  gear  predictors. 
These  variables  were  eliminated  because  the  pool  of  predictors  included  other  variables 
derived  from  the  same  vibration,  differing  only  from  the  dropped  variables  by  the 
algorithm  from  which  they  are  derived.  For  instance,  the  gear  variable  "AM  kurtosis"  is 
dropped  because  "derivative  AM  kurtosis"  is  also  present. 

A  logistic  regression  model  was  then  fit  in  Clementine  using  the  47  variables  left 
in  the  predictor  pool.  Backwards  elimination  was  again  used  to  eliminate  variables 
further,  leaving  12  Cl.  At  this  point  the  classification  error  rate  of  the  model  was  also 
monitored  so  as  to  choose  the  final  number  of  predictor  variables  in  a  backwards- 
stepwise  fashion.  Variables  continued  to  be  eliminated  from  the  model  as  long  as  the 
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misclassification  rate  stayed  low.  When  the  output  shows  an  increase  in  the 
misclassification  rate,  the  last  eliminated  variable  is  re-installed  in  the  model  and  that 
model  is  deemed  best.  Using  this  method,  "Envelope  Crest  Factor"  and  "Shaft  Order  3" 
were  eliminated  resulting  in  a  final  logistic  regression  model  with  an  overall  correct 
classification  rate  of  99%,  and  only  one  observation  from  a  bad  generator  classified  as 
good. 

Table  5.  Logit  Model  Classification  Rate 


Classification 

Observed 

Predicted 

0 (bad) 

1  (good) 

Percent 

Correct 

0  (bad) 

919 

1 

99.9% 

1 (good) 

9 

111 

92.5% 

Overall  Percentage 

89.2% 

10.8% 

99.0% 

Table  6.  Logit  Model  Fitting  Information 


Model  Fitting  Information 

Model 

Model  Fitting  Criteria 

Likelihood  Ratio  Tests 

-2  Log  Likelihood 

Chi-Square 

df 

Sig. 

Intercept  Only 

743.864 

Final 

77.720 

666.145 

10 

.000 

Goodness-of-Fit 


Chi-Square 

df 

Sig. 

Pearson 

6525.330 

1029 

.000 

Deviance 

77.720 

1029 

1.000 

Pseudo  R-Square 


Cox  and  Snell 

.473 

Nagelkerke 

.926 

McFadden 

.896 
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Table  7.  Logit  Model  Likelihood  Ratio  Tests 


Likelihood  Ratio  Tests 


Effect 

Model  Fitting  Criteria 

Likehhood  Ratio  Tests 

-2  Log  Likehhood  of 
Reduced  Model 

Chi-Square 

df 

Sig. 

Intercept 

119.540 

41.820 

1 

.000 

Shaft  Order  1  (IPS) 

145.015 

67.295 

1 

.000 

Shaft  Order  2  (IPS) 

144.412 

66.692 

1 

.000 

Gear  Distributed  Fault 

138.842 

61.122 

1 

.000 

G2-1 

80.286 

2.566 

1 

.109 

Residual  Peak  to  Peak 

141.611 

63.891 

1 

.000 

Gear  Misahgnment  1 

94.404 

16.684 

1 

.000 

Ball  Energy 

78.850 

1.130 

1 

.288 

Envelope  Peak  to  Peak 

184.416 

106.696 

1 

.000 

Envelope  Kurtosis 

117.695 

39.975 

1 

.000 

Envelope  Distributed 
Fault 

79.924 

2.204 

1 

.138 

The  chi-square  statistic  is  the  difference  in  -2  log-likelihoods  between  the  final 
model  and  a  reduced  model.  The  reduced  model  is  formed  by  omitting  an  effect 
from  the  final  model.  The  null  hypothesis  is  that  all  parameters  of  that  effect  are  0. 
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Table  8.  Logit  Model  Parameter  Estimates 


B 

Std.  Error 

Wald 

df 

Sig. 

Intercept 

37.010 

7.225 

26.242 

1 

.000 

Shaft  Order  1  (IPS) 

-9.369 

2.025 

21.407 

1 

.000 

Shaft  Order  2  (IPS) 

-98.357 

20.245 

23.603 

1 

.000 

Gear  Distributed  Fault 

-39.442 

8.025 

24.153 

1 

.000 

G2-1 

.019 

.013 

2.370 

1 

.124 

Residual  Peak  to  Peak 

1.004 

.207 

23.473 

1 

.000 

Gear  Misalignment  1 

.189 

.054 

12.223 

1 

.000 

Ball  Energy 

-96.958 

124.581 

.606 

1 

.436 

Envelope  Peak  to  Peak 

-6.652 

1.244 

28.571 

1 

.000 

Envelope  Kurtosis 

3.979 

.975 

16.653 

1 

.000 

Envelope  Distributed  Fault 

61.703 

44.512 

1.922 

1 

.166 

E.  TREE  MODELS 

Due  to  the  large  number  of  possible  predictor  variables  (Cl)  available  in  the  data 
set,  a  nonparametric,  data  mining  approach  is  used  to  augment  and  check  the  predictions 
of  the  logistic  regression  model.  We  use  a  procedure  based  on  Classification  and 
Regression  Trees  (CART)  developed  by  Breiman,  Friedman,  Olshen  and  Stone  in  1984. 
CART  searches  all  predictors  in  a  data  set,  making  a  split  in  each  predictor  which  reduces 
variability  of  the  dependent  variable  to  the  minimum  within  the  resulting  subsets.  This 
creates  two  leaves,  each  of  which  can  be  split  again.  This  continues  until  a  minimum 
threshold  is  reached. 

The  tree-fitting  process  provides  information  about  predictor  importance  as  well 
as  a  decent  prediction.  However,  it  is  vulnerable  to  over-fitting  and  thus  requires  cross- 
validation  and  pruning  (limiting  the  number  of  splits).  Figure  1 1  shows  the  un-pruned 
classification  tree  created  from  the  last  20  acquisitions  of  each  generator  in  the  training 
set.  The  65  Cl  determined  by  initial  variable  elimination  are  used  to  fit  this  tree. 
Appendix  F  displays  the  remaining  S-Plus  training  set  classification  tree  output. 
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Figure  11.  S-Plus  Classification  Tree  using  Training  Set.  The  inequality  above  each 
split  corresponds  to  the  left  branch.  At  each  leaf  "G"  indicates  a  leaf  with  a  higher 
proportion  of  good  generators  and  similarly  "B"  indicates  a  leaf  with  a  higher  proportion 
of  bad  generators. 


Classification  trees  are  an  intuitive  way  to  see  how  the  data  can  be  split  into 
subsets  capable  of  predicting  the  dependant  variable.  However,  their  accuracy  is  not 
always  satisfactory.  Leo  Breiman  introduced  the  concept  of  aggregating  many  different 
trees  and  allowing  them  to  each  “vote”  on  their  prediction  of  the  dependant  variable 
(Berk,  2005).  Different  aggregation  methods  have  been  developed  which  create  the 
multiple  trees,  or  forests,  in  different  ways.  Bagging  builds  trees  on  many  bootstrap 
samples.  Boosting  is  a  more  complicated  method  which  first  seeks  out  errors  while  re¬ 
sampling  from  original  data  in  order  to  focus  on  the  marginal  boundaries.  Accurate  trees 
are  then  given  more  weight  to  their  vote;  this  process  creates  predictions  with  excellent 
misclassification  rates.  Here,  the  random  forests  method  is  used  as  a  nonparametric  cross 
check  to  the  logistic  regression  model  because  it  builds  new  trees  by  randomly  choosing  a 
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subset  of  predictor  variables  each  time.  Pruning  is  not  required  as  the  aggregated  voting 
process  protects  against  over-fitting.  This  algorithm  is  ideal  for  the  large  IMD-HUMS 
data  set.  Five  hundred  trees  are  fitted  to  the  last  20  acquisitions  from  each  generator  in 
the  training  set  and  allowed  to  vote  using  the  random  Forest  function  in  the  R  statistical 
environment.  The  resulting  misclassification  rate  is  0.00673. 

The  forest  model  is  then  used  to  predict  the  entire  training  set  (misclassification 
rate  0.01420)  as  well  as  the  experimental  set  (see  Results  section).  One  drawback  to  the 
random  forest  is  its  “black  box”  nature  which  restricts  insight  into  how  predictions  are 
made,  although  variable  importance  is  obtainable. 
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IV.  RESULTS 


In  the  final  phase  of  the  study,  for  both  the  training  and  experimental  data  we 
compare  the  status  of  the  generators  to  the  estimated  probability  of  being  a  bad  generator 
based  on  the  logistic  regression  model  and  the  classification  of  being  bad  or  good  based 
on  the  forest  of  trees.  In  addition,  we  track  the  three  HI  (the  HI  for  the  shaft,  gear,  and 
bearing)  provided  by  IMD-HUMS. 

Only  the  last  20  acquisitions  for  each  generator  are  used  to  construct  the  logistic 
regression  and  forest  of  trees  models.  As  a  check  of  these  methods,  probabilities  of  bad 
are  estimated  for  each  acquisition  in  the  entire  two-year  period  for  which  IMD-HUMS 
data  is  available.  As  an  example  of  how  we  compare  results,  consider  generator  number 
43.  Generator  number  43  is  in  the  training  set  and  classified  as  a  good  generator.  The 
plot  of  estimated  probability  of  bad  (circular  dots)  based  on  logistic  regression  and 
classification  of  being  good  (0)  or  bad  (1)  (triangles)  based  on  forest  of  trees  is  given  in 
Figure  12. 
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•  Logit  Prediction 
A  Forest  Prediction 


Figure  12.  Generator  Number  43:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  and  the  Classification  of  Being  Good  (0)  or  Bad  (1)  from  the 
Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 


Notice  that  the  estimates  of  the  probability  of  being  bad  based  on  the  logistic 
regression  vary  from  acquisition  to  acquisition,  even  rising  above  0.5,  but  for  the  most 
part  are  small  with  the  majority  of  estimates  below  0.1.  For  this  generator  the  forest  of 
trees  classifies  the  generator  as  good  for  every  acquisition. 

To  see  the  trends  in  the  estimated  probabilities  from  the  logistic  regression  more 
clearly,  in  Figure  13  we  superimpose  a  smooth  nonparametric  fit  of  the  estimated 
probabilities  using  a  loess  smoother  (Montgomery,  2001).  At  each  acquisition,  the  loess 
smoother  fits  a  weighted  regression  using  only  the  nearest  neighbors  to  that  acquisition. 
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The  number  of  nearest  neighbors  used  is  governed  by  the  span,  or  proportion  of  total 
number  of  observations  in  the  data  set.  The  larger  the  span,  the  more  extensive  the 
smoothing.  For  most  generators,  the  loess  fit  with  a  span  of  .3  gives  a  smooth  estimate  of 
probability  of  bad  which  can  in  turn  be  used  to  predict  the  generator  status.  However, 
loess  fits  with  a  span  of  .3  for  generators  numbers  7,  53,  and  56  are  not  smooth;  thus 
cross-validation  is  used  to  automatically  set  span  parameters  between  .3  and  .5.  This 
cross  validation  is  implemented  by  default  when  using  the  S-Plus  function.  For 
consistency,  all  graphs  of  the  training  set  generator  predictions  in  the  remainder  of  the 
chapter  are  all  shown  with  the  S-Plus  "auto"  span  parameter.  Experimental  set  generator 
predictions  are  all  shown  with  a  .3  span  parameter.  Figure  13  shows  the  loess  fit  for 
generator  number  43.  For  this  generator,  the  loess  fit  is  a  straight  line  at  zero.  Thus,  the 
logistic  regression  results  indicate  that  the  generator  should  be  classified  as  good. 


O  Logit  Prediction 

-  Loess  Logit 

A  Forest  Prediction 


2003  -  2005 


Figure  13.  Generator  Number  43:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0)  or 
Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 
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Analogous  to  the  health  indicators,  we  use  the  loess  smooth  of  the  estimated 
probabilities  to  indicate  that  a  generator  is  good,  or  assign  a  strong,  moderate,  or  weak 
classification  to  a  generator  that  is  bad.  When  the  loess  fits  have  values  greater  than  .66, 
then  we  say  that  the  logistic  regression  strongly  indicates  the  generator  as  bad.  When  the 
loess  fit  has  estimated  fits  greater  than  .33  but  smaller  than  .66,  we  say  that  the  logistic 
regression  moderately  indicates  the  generator  as  bad.  A  value  between  0  and  .33  shows  a 
weak  indication,  and  a  straight  loess  fit  line  of  0  indicates  good.  A  summary  of  the 
results  is  given  in  Table  9  for  the  training  data  and  special  cases  are  discussed  in  detail  in 
this  chapter. 


Table  9.  Classification  of  Training  Set  Generators  Based  On  the  Logistic 
Regression  Fit.  (>.66  -  1.0  Strong,  >.33  -  .66  Moderate,  >0.0  -  .33  Weak,  0.0  Good) 


Prior 

Classification 

I 

^ogit  Predictions 

Good 

Weak 

Moderate 

Strong 

Total 

Good 

40 

6 

0 

0 

46 

Bad 

0 

0 

1 

5 

6* 

*  includes  additional  generator  discovered  during  model  formulation 

52 

The  rule  used  for  the  results  of  the  forest  of  trees  is  a  majority  of  1.0  predictions  is 
a  strong  classification  and  a  minority  of  1.0  predictions  is  a  moderate  classification. 

A.  RESULTS  FOR  GENERATORS  IN  THE  TRAINING  SET 

After  fitting  the  logistic  regression  model  and  forest  of  trees  to  the  last  20 
acquisitions  of  each  generator  in  the  training  set,  the  models  are  used  to  predict  generator 
state  throughout  the  entire  two-year  period  in  which  the  training  set  was  collected.  This 
serves  as  additional  validation  of  the  models,  as  well  as  providing  additional  information 
about  behavior  of  the  faulty  generators.  Appendix  B  provides  an  overview,  while 
subsequent  subsections  cover  specific  findings  for  generators  in  the  training  set. 
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1.  Four  Bad  Generators:  Numbers  22,  31,  53,  56 

Of  the  52  generators  in  the  training  set  five  are  classified  as  bad.  Four  of  these 
(numbers  22,  31,  53,  56)  are  similar  in  that  they  have  high  SOI  CL  For  all  four  of  these, 
the  generator  was  detennined  to  be  faulty  upon  inspection.  Figures  12  and  13  provide  an 
example  of  the  plot  for  a  good  generator.  In  contrast,  Figure  14  shows  the  corresponding 
plots  for  the  four  generators  from  the  training  set  for  which  damage  was  found  upon 
inspection.  It  is  not  surprising  that  both  the  logistic  regression  and  forest  of  trees  models 
classify  generators  with  proven  damage  as  bad,  since  these  generators  were  used  for 
model  fitting  and  their  Cl  have  values  which  form  clusters  separated  from  the  values  of 
the  Cl  from  the  rest  of  the  training  set(see  Figure  6).  In  particular,  these  generators  have 
high  SOI  and  S02  Cl  compared  to  the  good  generators  in  the  training  set.  Generator 
number  53  is  unusual  in  the  amount  of  variation  present  between  acquisitions,  shown  on 
the  next  page  in  Figure  14.  There  may  be  something  different  about  the  failure  mode  for 
this  generator,  but  no  clear-cut,  specific  cause  has  been  identified,  which  accounts  for  this 
variation  and  is  a  phenomenon  worthy  of  study. 


49 


•  Logit  Prediction 

-  Loess  Logit 

A  Forest  Prediction 


BAD 

31 


Oct  Dec  Feb  Apr  Jun  Aug  Oct  Dec  Feb  Apr  Jun  Aug 
2003  -  2005 


BAD 

56 


3/10/2004 


Figure  14.  Generators  Numbers  22,  31,  53,  56:  Plot  of  Estimated  Probability  of  Bad 
from  the  Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being 
Good  (0)  or  Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition 
Period 


2.  Generator  Number  9 

Generator  number  9  is  classified  as  a  bad  generator  because  of  an  actual  failure. 
During  operation  the  helicopter  did  not  receive  electrical  power  from  this  generator 
resulting  in  the  illumination  of  a  generator-fail  warning  light.  After  replacing  the 
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generator  with  a  new  one  the  problem  went  away.  Both  the  logistic  regression  and  forest 
of  trees  models  classify  the  number  9  generator  as  bad,  but  not  strongly  (see  Figure  15). 
These  results  are  consistent  with  the  plot  in  Figure  6  which  shows  that  generator  number 
9  is  not  easily  distinguished  from  the  good  generators.  The  figure  depicts  good  generator 
data  points  (light  grey  dots  if  viewed  in  the  non-color  copy  or  yellow  dots  if  viewed  in 
the  color  copy)  and  bad  generator  data  points  (light  grey  dots  if  viewed  in  the  non-color 
copy  or  purple  dots  if  viewed  in  the  color  copy)  using  five  important  Cl  as  variable 
inputs.  The  dark  grey  dots  intermingled  with  the  good  generator  data  points  are  primarily 
from  generator  number  9.  This  raises  the  question:  Was  the  generator  failure  merely 
electrical  in  nature  (such  as  an  electrical  short-circuit)  and  not  mechanical  and  therefore 
undetectable  by  the  IMD-HUMS  Cl?  This  generator  may  be  classified  as  bad  in  the 
logistic  regression  and  forest  of  trees  models  only  because  it  is  in  the  training  set  and  was 
used  to  build  both  the  logistic  regression  and  forest  of  trees  models.  Perhaps  it  is 
detected  in  the  logistic  and  forest  of  trees  models  due  to  over-fitting  as  a  result  of  its 
binary  bad  classification. 
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•  Logit  Prediction 
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A  Forest  Prediction 


Figure  15.  Generator  Number  9:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0)  or 
Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 

For  generator  number  9  there  are  some  acquisitions  for  which  the  bearing  HI  is  in 
the  warning  range,  but  these  warnings  are  present  on  many  good  generators.  To  justify 
inclusion  of  generator  number  9  on  the  bad  generator  list,  two  mini-experiments  are 
performed.  In  the  first,  the  data  is  perturbed  by  giving  a  binary  classification  of  good  (0) 
to  generator  number  9  and  fitting  a  new  logistic  regression  model.  Alarmingly,  generator 
number  9  is  then  predicted  to  be  a  perfectly  good  generator.  In  this  modified  data  the 
only  bad  generators  are  the  four  generators  with  high  SOI  Cl  (numbers  22,  31,  53,  56). 
In  the  second  mini-experiment,  the  data  is  then  perturbed  further  by  giving  a  binary 
classification  of  bad  (1)  to  a  perfectly  good  generator,  generator  number  26.  The  new 
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logistic  regression  model  gives  estimates  of  being  bad  to  this  good  (classified  bad) 
generator  no  higher  than  .3,  yet  now  gives  estimates  of  probabilities  to  generator  number 
9  (still  classified  as  good)  values  in  the  .5  to  .95  range.  This  suggests  generator  number  9 
is  not  a  case  of  a  good  generator  misclassified  as  a  bad  generator.  Therefore  generator 
number  9  is  retained  as  a  bad  generator  and  is  an  important  element  of  the  logistic  and 
forest  of  trees  model.  The  mode  of  failure  of  generator  number  9  may  be  different  from 
the  other  failure  modes  and  unique  to  the  data  set. 

3.  Generator  Initially  Classified  as  Good 

One  generator  initially  classified  as  good  in  the  training  set  is  detected  by  the 
logistic  regression  model  as  being  misclassified.  For  generator  number  7  the  logistic 
model  gives  strong  estimates  of  being  bad  (values  of  1.0,  much  stronger  than  generator 
number  9)  and  then  rapidly  drops  off  to  estimates  of  being  good  (values  of  0.0)  around 
July  2005,  see  Figures  16.  Figure  17  plots  in  EXCEL  the  three  IMD-HUMS  produced 
health  indicators  and  depicts  the  dramatic  change  from  a  bad  conditional  state  to  a  good 
conditional  state  for  generator  number  7. 
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Figure  16.  Generator  Number  7:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0)  or 
Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 


Figure  17.  Generator  Number  7:  EXCEL  Plot  of  IMD-HUMS  Produced  Shaft,  Gear 
and  Bearing  HI  (note  the  change  from  bad  to  good  HI) 
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The  generator  also  had  strong  bearing  HI  indications  which  drop  off  at  the  same 
time  as  the  logistic  regression  model.  Based  on  these  results  Goodrich  Corporation  re¬ 
examined  the  records  for  generator  number  7  and  confirmed  that  the  accessory  gearbox, 
which  houses  the  gear  and  bearing  for  the  generator,  had  been  replaced  on  the  aircraft 
(Bechhoefer,  2005).  The  model  has  properly  predicted  a  generator  to  be  in  an  unhealthy 
condition,  and  likewise  predicts  the  post-maintenance  health  as  good.  The  pre¬ 
maintenance  acquisitions  of  generator  number  7  were  then  reclassified  as  bad  and  the 
logistic  regression  model  was  refitted.  The  null  deviance  increased  from  658.42  to 
743.86.  The  residual  deviance  decreased  slightly  from  77.76  to  77.72.  The  final  forest  of 
trees  model  is  then  also  refit  including  generator  number  7  as  a  bad  generator. 

4.  Loess  Smoothing  and  “Weak”  or  "Scattered"  Logistic  Regression 
Predictions  goes  under  Loess 

The  Cl's  behavior  is  complex  and  highly  variable  in  nature.  Spikes  which  are  not 
easily  linked  to  a  specific  cause  can  occur;  further  it  is  difficult  to  determine  the 
periodicity.  This  complex  behavior  can  be  seen  in  varying  degrees  on  many  generators 
and  it  affects  HI  calculations  and  logistic  regression  predictions.  The  forest  of  trees 
appears  more  robust  to  these  fluctuations  than  the  logistic  regression,  possibly  due  to  its 
repetitive  re-sampling  and  voting  process.  To  avoid  high  false  alann  rates,  loess 
smoothing  is  perfonned  on  the  logistic  regression  using  S-Plus  (smoothing  parameter  0.3 
or  auto-default  for  the  training  set,  0.3  for  the  experimental  set).  Generator  logistic 
prediction  results  are  considered  bad  if  their  loess  curve  ever  moves  above  .33  with 
anything  over  .66  being  considered  a  strong  prediction.  A  “weak”  prediction  occurs 
when  there  are  enough  spikes  to  pull  the  loess  curve  above  zero.  A  “scattered” 
classification  occurs  when  there  are  one  or  more  spikes  before  the  loess  smoothes  them 
down  to  zero.  Figure  18  shows  examples  of  "weak"  and  "scattered"  logistic  regression 
examples. 
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•  Logit  Prediction 

-  Loess  Logit 

A  Forest  Prediction 

Example  of  “Scattered  example  of  “Weak” 
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Figure  18.  Generators  Numbers  2  and  18:  Examples  of  "Scattered"  and  "Weak"  Logit 
Plots  of  Estimated  Probability  of  Bad  from  the  Logistic  Regression  Model  with 
Smoothing  and  the  Classification  of  Being  Good  (0)  or  Bad  (1)  from  the  Forest  of  Trees 
Model  for  the  Entire  Two-Year  Acquisition  Period 


5.  Good  Generators 

There  are  only  three  classified  good  generators  with  any  bad  (1)  forest  of  trees 
predictions  (numbers  10,  39,  65).  These  few  bad  predictions  are  sporadic  and  each  time 
they  are  accompanied  by  weak  or  scattered  logit  predictions  as  depicted  in  Figure  19. 
However,  with  the  loess  smoother  applied  the  logistic  regression  model  classifies  these 
three  generators  strictly  as  good. 
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•  Logit  Prediction 

-  Loess  Logit 

A  Forest  Prediction 


Figure  19.  Generator  Number  39:  Example  of  Sporadic  Forest  of  Trees  Predictions, 
Plot  of  Estimated  Probability  of  Bad  from  the  Logistic  Regression  Model  with  Smoothing 
and  the  Classification  of  Being  Good  (0)  or  Bad  (1)  from  the  Forest  of  Trees  Model  for 
the  Entire  Two-Year  Acquisition  Period 


B.  RESULTS  FOR  GENERATORS  IN  THE  EXPERIMENTAL  SET 

With  generator  number  7  reclassified  as  bad  prior  to  its  maintenance  and  with 
both  models  refit  with  this  reclassification  the  logistic  regression  and  forest  of  trees 
models  are  applied  to  the  experimental  set.  A  summary  of  the  logit  results  is  given  in 
Table  10. 
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Table  10.  Classification  of  Experimental  Set  Generators  Based  On  the  Logistic 
Regression  Fit.  (>.66  -  1.0  Strong,  >.33  -  .66  Moderate,  >0.0  -  .33  Weak,  0.0  Good) 


Prior 

Classification 

I 

^ogit  Predictions 

Good 

Weak 

Moderate 

Strong 

Total 

Good 

4 

0 

0 

1 

5 

Bad 

1* 

0 

0 

0 

1 

Watch  List 

2 

2 

1 

1 

6 

*  generator  #33  removed  due  to  generator  caution  light 
no  IMD-HUMS  indications  or  model  predictions  of  being  bad 

12 

1.  Classified  Generators 

The  lone  experimental  generator  classified  as  bad,  number  33,  taken  off  the 
aircraft  due  to  a  generator  caution  light  has  no  logistic  regression  or  forest  of  trees 
predictions  of  bad  condition  as  well  as  no  HI  warnings  (Appendix  C).  Thus  evidence 
points  to  the  cause  of  failure  to  be  strictly  electrical,  such  as  a  short-circuit. 

Of  the  five  generators  classified  as  good  (numbers  15,  40,  58,  64,  66)  in  the 
experimental  data  set,  four  show  no  bad  predictions  made  by  either  the  logistic  regression 
or  forest  of  trees  models.  Generator  number  15  shows  a  highly  unusual  and  fairly  strong 
logistic  regression  result  comparable  to  generator  number  30  of  the  experimental  data  set 
and  generator  number  9  of  the  training  set.  However,  those  generators  also  show  bad 
predictions  with  forest  of  trees  and  at  least  some  HI  warnings.  Generator  number  15  has 
no  bad  predictions  from  the  forest  of  trees  model.  (Figure  20). 
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•  Logit  Prediction 

-  Loess  Logit 

A  Forest  Prediction 


Figure  20.  Generator  Number  15  (Aircraft  9326493,  Left):  Plot  of  Estimated 
Probability  of  Bad  from  the  Logistic  Regression  Model  with  Smoothing  and  the 
Classification  of  Being  Good  (0)  or  Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire 
Two-Year  Acquisition  Period 

A  request  was  sent  to  the  users  for  additional  infonnation  concerning  the  current 
state  of  this  generator  and  whether  any  maintenance  had  been  performed.  A  detailed 
inspection  of  maintenance  records  indicates  that  indeed  the  generator  had  been  replaced 
during  a  major  maintenance  reset  in  October  2004.  This  coincides  directly  with  the  drop 
from  strong  to  weak  logit  prediction.  The  logit  model  has  again  properly  identified  a 
generator  in  bad  condition. 
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2.  Unclassified  "Watch  List"  Generators 

The  logistic  regression  and  forest  of  trees  models  are  then  used  to  predict  the 
status  of  the  generators  of  questionable  status  (watch  list)  in  the  experimental  data  set. 
The  summary  table  and  a  complete  set  of  result  graphs  are  given  in  Appendix  C. 

Generator  Number  30,  which  has  shaft  HI  alarms,  is  predicted  as  bad  fairly 
strongly  by  both  the  logistic  regression  and  forest  of  trees  models.  This  generator  is 
unusual  due  to  the  sharp  increase  in  both  the  predicted  probability  of  bad  and  the  number 
of  instances  of  bad  classification  that  occurs  while  shifting  into  alarm  status  (Figure  21). 
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Figure  2 1 .  Generator  Number  30  (Aircraft  9426545,  Left):  Plot  of  Estimated 
Probability  of  Bad  from  the  Logistic  Regression  Model  with  Smoothing  and  the 
Classification  of  Being  Good  (0)  or  Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire 
Two-Year  Acquisition  Period 


•  Logit  Prediction 


-  Loess  Logit 

A  Forest  Prediction 


Jan  Mar  May  Jul  Sep  Nov  Jan  Mar  May  Jul 

2004  -  2005 


60 


Generators  Numbers  2 1  and  48  have  shaft  HI  alarms  predicted  fairly  strongly  with 
the  forest  of  trees  model  and  weakly  with  the  loess  smoothed  logistic  regression  model 
(Figure  22).  The  high  variability  of  these  generators  keeps  the  loess  curve  from  climbing, 
but  such  high  variance  can  be  a  symptom  of  impending  failure.  Therefore  the  subjective 
assessment  is  made  that  these  are  indeed  bad  generators. 


•  Logit  Prediction 
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A  Forest  Prediction 
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Figure  22.  Generator  Numbers  2 1  and  48:  Plot  of  Estimated  Probability  of  Bad  from 
the  Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0) 
or  Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 
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Generator  number  55  had  medium  predictions  from  both  the  logistic  regression 
and  the  forest  of  trees  models  (Figure  23).  Interestingly,  the  logistic  regression  model 
shows  an  improvement  in  the  generator's  state  while  the  forest  of  trees  model  predicts  a 
bad  state  only  sporadically  toward  the  latter  portion  of  the  acquisitions.  This  shows  that 
the  models  indeed  function  differently,  even  though  they  tend  to  agree  with  each  other. 

•  Logit  Prediction 


-  Loess  Logit 

A  Forest  Prediction 
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Figure  23.  Generator  Number  55:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0)  or 
Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 
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Generator  number  6  and  generator  24  are  on  the  watch  list  but  are  not  predicted 
bad  by  the  logistic  regression  or  forest  of  trees  models.  Notably,  generator  number  24 
does  have  some  sporadic  logit  predictions  as  seen  in  Figure  24.  The  subjective 
assessment  is  that  they  are  not  bad  enough  to  warrant  replacement. 
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Figure  24.  Generator  Number  24:  Plot  of  Estimated  Probability  of  Bad  from  the 
Logistic  Regression  Model  with  Smoothing  and  the  Classification  of  Being  Good  (0)  or 
Bad  (1)  from  the  Forest  of  Trees  Model  for  the  Entire  Two-Year  Acquisition  Period 
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V.  CONCLUSION 


This  thesis  demonstrates  that  a  logistic  regression  model  which  predicts  the 
overall  state  of  a  UH-60L  electrical  generator  can  be  fit  using  IMD-HUMS  data  collected 
with  known  cases  of  failed  generators  and  properly  operating  generators.  Generator  status 
serves  as  the  dependent,  binary  response  variable.  The  independent  predictor  variables 
can  be  chosen  using  correlation  with  the  dependent  variable,  backwards  elimination  using 
p-values,  and  classification  rates.  The  model  is  refined  by  incorporating  new  failures  as 
they  occur  into  the  data  set  and  refitting  to  produce  a  more  sensitive  and  accurate 
prediction  model.  This  results  in  an  accurate  picture  of  a  "bad"  generator  and  generators 
susceptible  to  failure. 

A  random  forest  of  trees  was  also  created  as  a  nonparametric  augmentation  to  the 
prediction  effort.  It  serves  to  quickly  and  automatically  sample  combinations  of  the 
predictors,  aggregating  votes  in  order  to  make  accurate  predictions  which  are  fairly 
robust  to  false  alarms.  A  single  classification  or  regression  tree  can  be  created  as  a 
parallel  effort  in  understanding  the  important  predictors,  helping  during  variable  selection 
for  a  logistic  regression  model. 


A.  APPLICATIONS 

Due  to  the  highly  variable  nature  of  the  predictor  values,  this  model  has  lower 
success  predicting  states  with  just  one  acquisition.  In  addition,  this  type  of  model  may 
not  be  able  to  predict  failures  of  types  not  included  in  the  model  building.  As  data  is 
accrued,  these  previously  unobserved  failure  modes  should  become  increasingly  rare.  No 
effort  is  recommended  to  supplant  any  current  algorithms  currently  on  board  the  aircraft. 
Its  greatest  value  may  be  in  the  picture  it  creates  of  how  an  at-risk  mechanical  component 
behaves.  This  technique  is  easily  transferable  to  other  components  on  the  helicopter  as 
well  as  to  other,  completely  different,  platforms.  The  beauty  of  these  models  and  the 
process  of  deriving  them  is  that  the  relatively  accurate  state  pictures  they  produce  are 
attained  with  minimal  effort,  time  and  expense.  Requirements  are  only  an  understanding 
of  the  system  and  data  set,  off-the-shelf  statistical  software  and  a  computer. 
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Concurrent  with  data  collection,  the  development  of  component  prediction  models 
of  importance,  for  example  for  transmissions  or  engines,  could  be  initiated.  The  selection 
of  pertinent  Cl  predictors  should  start  using  not  only  an  understanding  of  the  system's 
mechanics  and  vibrations  but  also  the  incorporation  of  parametric  and  nonparametric 
statistical  approaches.  As  more  data,  including  component  failures,  is  collected  the 
models  are  refined.  The  use  of  Ggobi  in  detecting  different  failure  modes  is  a  particularly 
simple  and  quick  way  to  investigate  the  IMD-HUMS  data.  These  real-data  based  models, 
which  are  easily  derived,  are  pertinent  in  the  move  toward  Condition  Based  Maintenance. 

For  instance,  periodically  a  serious  defect  is  found  on  one  or  more  single-type 
aircraft  resulting  in  a  grounding  of  the  fleet.  ASAM,  SoFM  and  IRAC  messages  dictate 
specific  inspections  or  corrective  maintenance  actions  which  must  be  accomplished  on 
each  aircraft  prior  to  the  resumption  of  flight  operations.  The  time  required  to  fulfill  the 
requirements  of  these  messages  severely  impacts  both  real-world  and  training  operations. 
In  the  move  toward  CBM,  this  dual  logistic  regression  and  forest  of  trees  process  could 
be  used  to  focus  initial  inspection  efforts  on  only  those  aircraft  whose  “picture” 
resembles  the  problem  aircraft.  The  other  aircraft  could  continue  operations  and  get 
inspected  at  the  next  convenient  maintenance  period. 

Another  practical  application  of  this  process  is  to  reduce  data  collection 
requirements  of  the  onboard  system.  Important  predictor  variables  which  continually 
show  up  in  logistic  regression  and  forest  of  trees  models  would  be  retained  while 
variables  which  never  show  importance  become  candidates  for  removal.  This  would  free 
up  valuable  memory  space  in  the  onboard  system. 


B.  RECOMMENDATIONS  FOR  FURTHER  STUDY 

Aspects  critical  to  the  development  of  better  component  health  prediction  models 
are  the  incorporation  of  variance  within  the  multiple  Cl,  concise  variable  selection,  and 
time-series  trends. 

It  is  known  that  an  increase  in  Cl  variability  overtime  is  an  indication  of 
deteriorating  component  health,  but  the  thresholds  between  nonnality  and  abnonnality  of 


66 


variance  for  the  many  Cl  has  not  yet  been  determined.  The  large  data  sets  now  being 
produced  by  IMD-HUMS  can  be  used  to  estimate  the  variance  of  the  CL 

Further  analysis  of  variable  selection  in  component  health  prediction  models  is 
also  worthy  of  more  attention.  If  the  number  of  Cl  can  be  definitively  limited  to  a  few 
very  effective  predictors  the  "curse  of  multi-dimensionality"  can  be  eliminated  and 
component  health  distributions  can  be  estimated  accurately. 

The  multiple  acquisitions  over  time  for  each  Cl  can  be  used  for  trend  analysis. 
Rates  of  change  in  the  Cl  values  incorporated  in  the  prediction  models  could  ultimately 
be  used  in  accurately  estimating  available  component  lifetime.  The  loess  smoothing  used 
in  the  logistic  regression  model  serves  as  a  primitive  attempt  to  account  for  trends. 
However,  the  data  provides  potential  to  use  time  series  information  in  a  much  more 
effective  manner.  Further  study  of  the  time  series  relationships  may  illuminate  factors 
which  cause  the  seemingly  random  oscillations  in  CL 

The  further  study  of  variability  and  trends  could  help  in  addressing  the  great  deal 
of  noise  present  in  the  data.  Random  data  spikes  complicate  the  setting  of  thresholds  and 
the  development  of  accurate,  real-time  state  prediction  algorithms.  In  the  logistic 
regression  model,  this  created  the  need  for  loess  smoothing.  While  the  random  forest  of 
trees  is  more  robust  to  false  alarms  caused  by  certain  spikes,  the  Type  II  error  rates  are 
not  known  and  the  model  may  be  too  insensitive. 

Ideally,  the  best  of  models  would  detennine  from  a  single  acquisition  a 
component's  state  and  the  remaining  lifetime  of  use.  The  development  of  such  models 
require  further  study  in  understanding  the  distribution  of  failures  for  each  component, 
variability  within  and  among  Cl,  and  trending  of  Cl  over  time. 
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APPENDIX  A  IMD-HUMS  SHAFT,  GEAR  AND  BEARING  Cl 


Each  IMD-HUMS  acquisition  concerning  the  shaft,  spur  gear  and  bearings  of  a 
generator  results  in  the  reading  of  the  169  variables  listed  here.  The  response  variables 
and  the  generator  number  are  added  for  this  study.  The  subset  of  65  potential  predictors 
remaining  after  initial  variable  elimination  are  highlighted  in  grey. 


RESPONSE 

status 

status  binary 

SHAFT  Cl 

Component  Name-  shaft 

DateTime 

ah.Tail 

GEN  Number 

Torque 

Airspeed 

Main  Rotor  Speed 
OAT 

MGBTEMP 

Regime 

Opldx 

OpRTR 

OpNPH 

RTRUSG 

NPH 

Health 

PriRAW 

SecRAW 

COMP 

SENS 

Engl  Torque 
Eng2Torque 
DQ 

XAXIS 

Shaft  Order  1  (IPS) 

Shaft  Order  2  (IPS) 


Shaft  Order  3  (IPS) 

Half  Shaft  Order  (IPS) 

Shaft  Order  1  (OBS) 

Shaft  Order  2  (OBS) 

Shaft  Order  3  (OBS) 
RecomputedHealthlndicator 
Shaft  Order  1  (g) 

Shaft  Order  2  (g) 

Shaft  Order  3  (g) 

Half  Shaft  Order  (g) 

Half  Shaft  Order  (OBS) 

Sig  Avg  Peak  to  Peak 
Sig  Avg  RMS 
Health  Indicator 
Sig  Avg  Crest  Factor 
Sig  Avg  Skewness 
Sig  Avg  Kurtosis 
Sig  Avg  Fifth  Moment 
Sig  Avg  Sixth  Moment 
Residual  Peak  to  Peak 
Residual  RMS 
Residual  Crest  Factor 
Residual  Skewness 
Residual  Kurtosis 
Residual  Fifth  Moment 
Residual  Sixth  Moment 
Sig  Avg  LI 
EO  Peak  to  Peak 
EO  RMS 
EO  Crest  Factor 
EO  Skewness 
EO  Kurtosis 
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EO  Fifth  Moment 
EO  Sixth  Moment 
Gear  Distributed  Fault 
Resample  Rate 

MeasuredShaft  Speed  Phase  Kurtosis 
EO  LI 

Total  Torque 
Airspeed 

Main  Rotor  Speed 
Engine  1  GasT  urbineSpeed 
Engine  1  PowerT  urbineSpeed 
Engine  1  Torque 
Engine2GasTurbineSpeed 
Engine2PowerT  urbineSpeed 
Engine2  Torque 

GEAR  Cl 

DateTime 

Tail 

Name-gear 

Health 

PriRAW 

SecRAW 

COMP 

SENS 

DQ 

XAXIS 

Residual  Kurtosis 
Residual  RMS 
Sideband  Mod  1 
Narrowband  CrestFactor 
Gear  Distributed  Fault 
G2-1 

Residual  Peak  to  Peak 
RecomputedHealthlndicator 
Sig  Avg  Peak  to  Peak 
Sig  Avg  Kurtosis 
Sig  Avg  RMS 
Residual  Skewness 
Residual  Crest  Factor 
Residual  Fifth  Moment 
Residual  Sixth  Moment 
Gear  Misalignment  1 
Sideband  Mod  2 
sm  3  AS  Sideband  Mod  3 


Health  Indicator 
Gear  Misalignment  2 
Gear  Misalignment  3 
Narrowband  RMS 
Narrowband  Peak  to  Peak 
Narrowband  Skewness 
Narrowband  Kurtosis 
Narrowband  FifthMoment 
Narrowband  Sixth  Moment 
Instantaneous  Frequency 
CSM 

AM  Kurtosis 
Derivative  AM  Kurtosis 
FM  Kurtosis 
Derivative  FM  Kurtosis 
FM  Peak  to  Peak 
G2-2 
G2-3 

BEARING  Cl 

Date. Time 

ah.Tail 

BearingName 

BearingPart 

Health 

PriRAW 

SecRAW 

COMP 

brg.  Priority 

DQ 

XAXIS 

Ball  Energy  (Norm) 

Cage  Energy  (Norm) 

Inner  Race  Energy  (Nonn) 
Outer  Race  Energy  (Norm) 
Bearing  Energy  15k-20k 
Total  Bearing  Energy  (Norm) 
Envelope  RMS 
Recomputed  Health  Indicator 
Ball  Energy 
Cage  Energy 
Inner  Race  Energy 
Outer  Race  Energy 
Total  Bearing  Energy 
Envelope  Peak  to  Peak 
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Envelope  Crest  Factor 
Envelope  Skewness 
Envelope  Kurtosis 
Envelope  Fifth  Moment 
Envelope  Sixth  Moment 
Health  Indicator 
Envelope  Distributed  Fault 
Tone  Energy 
Base  Energy 
Ball  Mod  Cage 
Inner  Race  Mod  Ball 
Inner  Race  Mod  Cage 
Inner  Race  Mod  Outer 


Outer  Race  Mod  Ball 
Outer  Race  Mod  Cage 
TotalBearingCoupling  Energy 
Ball  Mod  Shaft 
Cage  Mod  Shaft 
Inner  Race  Mod  Shaft 
Outer  Race  Mod  Shaft 
TotalShaft-Bearing  Coupling 
Ball  Spin  Ratio 
Cage  Ratio 
Inner  Race  Ratio 
Outer  Race  Ratio 
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APPENDIX  B  TRAINING  SET  RESULTS  SUMMARY 


Appendices  B  and  C  give  the  complete  logistic  regression  and  Random  Forest  of 
Classification  Trees  results.  Appendices  D  and  E  give  a  complete  set  of  IMD-HUMS  HI 
for  comparison.  Note  the  change  in  the  method  of  shaft  HI  computation  around  October 
2004. 


status  binary 

Bd 

denotes  generators  with  proven  faults  (bad) 

Gd 

denotes  generators  without  proven  faults  (good) 

HI :  Health  Indication  provided  by  IMD-HUMS 

on-board  algorithms 

s 

shaft  warning  (SS  denotes  alarm  status) 

G 

gear  warning 

B 

bearing  warning  (BB  denotes  alarm  status) 

Logit 

strong 

loess  smoothed  values  over  0.66 

moderate 

loess  smoothed  values  over  0.33 

weak 

loess  smoothed  values  between  0  and  0.33 

scattered 

logit  spikes  of  1.0  that  do  not  pull  loess  curve  above  0 

Forest 

strong 

majority  of  classifications  are  1.0  (bad) 

moderate 

minority  of  classifications  are  1 .0  (bad) 
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Helicopter 

Tail 

Number 

Generator 

Side 

Status 

Generator 

Number 

HI 

S,G,B 

Logit 

Forest 

9126351 

Left 

Gd 

1 

G 

9226432 

Left 

Gd 

2 

BB 

9226435 

Left 

Gd 

3 

9226438 

Left 

Gd 

4 

9226439 

Left 

Gd 

5 

9226443 

Left 

Gd,Bd 

7 

G,B 

strong 

strong 

9226446 

Left 

Gd 

8 

9226450 

Left 

Gd 

10 

B 

weak 

moderate 

9226453 

Left 

Gd 

11 

S 

9226455 

Left 

Gd 

12 

G,B 

9326477 

Left 

Gd 

13 

9326485 

Left 

Gd 

14 

G 

9326500 

Left 

Gd 

16 

9326506 

Left 

Gd 

17 

B 

9326507 

Left 

Gd 

18 

S,B 

weak 

9326509 

Left 

Gd 

19 

weak 

9326515 

Left 

Gd 

20 

9326524 

Left 

Gd 

25 

B 

9326530 

Left 

Gd 

26 

9426533 

Left 

Gd 

27 

B 

9426534 

Left 

Gd 

28 

S,G,B 

weak 

9426537 

Left 

Gd 

29 

G,B 

weak 

9426549 

Left 

Gd 

32 

G 

9126351 

Right 

Gd 

35 

9226432 

Right 

Gd 

36 

9226435 

Right 

Gd 

37 

S 
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Helicopter 

Tail 

Number 

Generator 

Side 

Status 

Generator 

Number 

HI 

S,G,B 

Logit 

Forest 

9226438 

Right 

Gd 

38 

9226439 

Right 

Gd 

39 

SS,G,B 

scattered 

moderate 

9226443 

Right 

Gd 

41 

G,B 

scattered 

9226446 

Right 

Gd 

42 

9226450 

Right 

Gd 

43 

g,bb 

9226453 

Right 

Gd 

44 

B 

9226455 

Right 

Gd 

45 

G,B 

9326477 

Right 

Gd 

46 

9326485 

Right 

Gd 

47 

G,BB 

9326500 

Right 

Gd 

49 

S,G 

9326506 

Right 

Gd 

50 

G 

9326507 

Right 

Gd 

51 

B 

9326509 

Right 

Gd 

52 

G 

9326515 

Right 

Gd 

54 

G 

9326518 

Right 

Gd 

57 

9326524 

Right 

Gd 

59 

S,G,B 

9326530 

Right 

Gd 

60 

G 

9426533 

Right 

Gd 

61 

G 

9426534 

Right 

Gd 

62 

weak 

9426537 

Right 

Gd 

63 

S,G 

9426549 

Right 

Gd 

65 

SS,G 

scattered 

moderate 

9226450 

Left 

Bd 

9 

B 

moderate 

moderate 

9326518 

Left 

Bd 

22 

SS 

strong 

strong 

9426549 

Left 

Bd 

31 

SS 

strong 

strong 

9326515 

Right 

Bd 

53 

SS 

strong 

strong 

9326518 

Right 

Bd 

56 

SS 

strong 

strong 
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APPENDIX  C  EXPERIMENTAL  SET  RESULTS  SUMMARY 


Helicopter 

Tail 

Number 

Generator 

Side 

Generator 
Status  Number 

HI 

S,G,B 

Logit 

Forest 

9226441 

Left 

watchlist 

6 

G 

9326493 

Left 

Gd 

15 

strong 

9326516 

Left 

watchlist 

21 

SS 

weak 

strong 

9326519 

Left 

watchlist 

24 

S 

9426545 

Left 

watchlist 

30 

SS 

strong 

strong 

9926829 

Left 

Bd 

33 

9926441 

Right 

Gd 

40 

S 

9326493 

Right 

watchlist 

48 

SS,G 

weak 

strong 

9326516 

Right 

watchlist 

55 

S,G 

moderate 

moderate 

9326519 

Right 

Gd 

58 

B 

9426545 

Right 

Gd 

64 

S 

9926829 

Right 

Gd 

66 

G 
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•  Logit  Prediction 

-  Loess  Logit 
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Model  for  the  Entire  Two-Year  Acquisition  Period 
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APPENDIX  D 
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APPENDIX  F  TRAINING  SET  CLASSIFICATION  TREE 


***  Tree  Model  *** 

Classification  tree: 

tree (formula  =  status  ~  Shaf t . Order . 1 .. IPS .  +  Shaf t . Order . 2 . . IPS .  + 

Shaf t . Order . 3 . . IPS .  +  Half . Shaf t . Order .. IPS .  +  Gear . Distributed . Fault  + 
Residual . Kurtosis  +  Residual. RMS  +  Sideband . Mod . 1  + 

Narrowband . CrestFactor  +  G2 . 1  +  Residual . Peak . to . Peak  + 

Sig .Avg . Peak . to . Peak  +  Sig .Avg . Kurtosis  +  Sig.Avg.RMS  + 

Residual . Skewness  +  Residual . Crest . Factor  +  Residual . Fifth . Moment  + 
Residual . Sixth . Moment  +  Gear . Misalignment . 1  +  sm . 3 . AS . Sideband . Mod . 3  + 
Gear . Misalignment . 2  +  Gear . Misalignment . 3  +  Narrowband . RMS  + 

Narrowband . Peak . to . Peak  +  Narrowband . Skewness  +  Narrowband . Kurtosis  + 
Narrowband . FifthMoment  +  Narrowband . Sixth . Moment  + 

Instantaneous . Frequency  +  CSM  +  AM. Kurtosis  +  Derivative .AM. Kurtosis  + 
FM. Kurtosis  +  Derivative . FM . Kurtosis  +  FM . Peak . to . Peak  +  G2 . 2  +  G2 . 3  + 
Bearing . Energy . 15k . 20k  +  Envelope. RMS  +  Ball. Energy  +  Cage. Energy  + 
Inner . Race . Energy  +  Outer . Race . Energy  +  Total . Bearing . Energy  + 

Envelope . Peak . to . Peak  +  Envelope . Crest . Factor  +  Envelope . Skewness  + 
Envelope . Kurtosis  +  Envelope . Fifth . Moment  +  Envelope . Sixth . Moment  + 
Envelope . Distributed . Fault  +  Tone . Energy  +  Base. Energy  + 

Ball . Mod . Cage .  +  Inner . Race . Mod . Ball  +  Inner . Race . Mod . Cage  + 

Inner . Race . Mod . Outer  +  Outer . Race . Mod . Ball  +  Outer . Race . Mod . Cage  + 

Total . Bearing . Coupling . Energy  +  Ball . Mod . Shaf t  +  Cage . Mod . Shaf t .  + 

Inner . Race . Mod . Shaf t  +  Outer . Race . Mod . Shaf t  + 

Total . Shaf t . Bearing . Coupling,  data  =  CGDNtrainingCUT . 65 ,  na. action  = 
na. exclude,  mincut  =  5,  minsize  =  10,  mindev  =  0.01) 

Variables  actually  used  in  tree  construction: 

[1]  "Shaft. Order. 1. .IPS. "  "G2.1"  "Base . Energy" 

[4]  "Gear . Misalignment . 3 "  "G2.3"  "Half . Shaf t . Order .. IPS . " 

Number  of  terminal  nodes :  7 

Residual  mean  deviance:  0.01577  =  16.29  /  1033 
Misclassif ication  error  rate:  0.004808  =  5  /  1040 
node),  split,  n,  deviance,  yval,  (yprob) 

*  denotes  terminal  node 

1)  root  1040  658.400  G  (  0.09615  0.90380  ) 

2)  Shaft .Order . 1 .. IPS . <1 . 72485  970  281.300  G  (  0.03299  0.96700  ) 

4)  G2 . 1<38 . 5724  200  175.900  G  (  0.16000  0.84000  ) 

8)  Base. Energy<0. 655714  154  0.000  G  (  0.00000  1.00000  )  * 

9)  Base. Energy>0. 655714  46  56.530  B  (  0.69570  0.30430  ) 

18)  Gear .Misalignment . 3<-41 . 7041  11  0.000  G  (  0.00000  1.00000  )  * 

19)  Gear .Misalignment . 3>-41 . 7041  35  20.480  B  (  0.91430  0.08571  ) 

38)  G2 . 3<65 . 3999  7  9.561  B  (  0.57140  0.42860  )  * 

39)  G2 . 3>65 . 3999  28  0.000  B  (  1.00000  0.00000  )  * 

5)  G2 . 1>38 . 5724  770  0.000  G  (  0.00000  1.00000  )  * 

3)  Shaft .Order . 1 .. IPS . >1 . 72485  70  18.160  B  (  0.97140  0.02857  ) 

6)  Half .Shaft. Order. .IPS. <0.284577  65  0.000  B  (  1.00000  0.00000  )  * 

7)  Half .Shaft. Order. .IPS. >0.284577  5  6.730  B  (  0.60000  0.40000  )  * 
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